Resilience against random hardware faults is one of the main concerns when designing mission- and safety-critical systems. These kinds of faults can have transient nature since they are typically due to physical phenomena like vibrations, radiation, or electromagnetic interferences. Resiliency against this kind of fault is traditionally implemented using hardware solutions, yet they provide low flexibility and high engineering cost, as well as affect energy, thermal and weight requirements of the system. Software solutions, commonly named Software-Implemented Hardware Fault Tolerance (SIHFT), mitigate transient faults by employing software-only techniques, that are more flexible and lower the production costs. This work shows how three SIHFT mechanisms can be implemented in a language- and architecture-independent environment by leveraging the open-source compiler framework LLVM, and enhanced via novel overhead reduction techniques and detection strategies. Additionally, an open-source real-time operating system is compiled with the aforementioned mechanisms, both in the kernel and the real-time tasks, and runs on a physical STM32 board. In order to do so, special adaptations were required, which are discussed in this thesis. Finally, an experimental evaluation shows the effectiveness of the proposed approach in detecting faults, highlighting the differences between the different techniques and the tweaks implemented to manage the trade-off between performance overhead and detection capabilities.
Una delle più grandi sfide nella creazione di sistemi mission- e safety-critical è la resistenza contro guasti casuali nell'hardware. Guasti di questo tipo possono avere una natura temporanea a causa di fenomeni fisici temporanei come vibrazioni, radiazioni e interferenze elettromagnetiche. Resistenza a questo tipo di guasti è possibile tramite l'adozione di soluzioni hardware, a discapito di flessibilità e costi di produzione. Inoltre, questo tipo di soluzioni sfavorisce il soddisfacimento di molti requisiti non funzionali dei sistemi safety-critical, come i requisiti energetici, di peso o termici. Soluzioni software, conosciute con il nome di Software-Implemented Hardware Fault Tolerance (SIHFT), sono in grado di proteggere contro questo tipo di guasti garantendo maggiore flessibilità e minori costi di produzione. Questa tesi fornisce una descrizione di come tre di questi meccanismi possono essere implementati in LLVM, un ambiente indipendente dal linguaggio di programmazione e dall'architettura del sistema, proponendo nuove migliorie sia per la detection che per l'overhead. In più, la tesi discute di come compilare il kernel e le task di un sistema operativo real-time con i meccanismi sopraccitati, facendolo eseguire su una board di STM32. Per fare ciò, è stato necessario adattare il sistema tramite dei meccanismi che sono discussi nella tesi. Infine, una campagna di sperimentazione dimostra l'efficacia del nostro framework e le differenze tra le tecniche di SIHFT e l'impatto di nuove soluzioni di personalizzazione per gestire il trade-off tra il grado di detection e la penalizzazione in performances.
Hardening safety-critical systems against transient faults via SIHFT compiler transformations
BAROFFIO, DAVIDE
2022/2023
Abstract
Resilience against random hardware faults is one of the main concerns when designing mission- and safety-critical systems. These kinds of faults can have transient nature since they are typically due to physical phenomena like vibrations, radiation, or electromagnetic interferences. Resiliency against this kind of fault is traditionally implemented using hardware solutions, yet they provide low flexibility and high engineering cost, as well as affect energy, thermal and weight requirements of the system. Software solutions, commonly named Software-Implemented Hardware Fault Tolerance (SIHFT), mitigate transient faults by employing software-only techniques, that are more flexible and lower the production costs. This work shows how three SIHFT mechanisms can be implemented in a language- and architecture-independent environment by leveraging the open-source compiler framework LLVM, and enhanced via novel overhead reduction techniques and detection strategies. Additionally, an open-source real-time operating system is compiled with the aforementioned mechanisms, both in the kernel and the real-time tasks, and runs on a physical STM32 board. In order to do so, special adaptations were required, which are discussed in this thesis. Finally, an experimental evaluation shows the effectiveness of the proposed approach in detecting faults, highlighting the differences between the different techniques and the tweaks implemented to manage the trade-off between performance overhead and detection capabilities.File | Dimensione | Formato | |
---|---|---|---|
Tesi_Baroffio.pdf
Open Access dal 30/06/2024
Descrizione: Tesi
Dimensione
1.2 MB
Formato
Adobe PDF
|
1.2 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary_Baroffio.pdf
Open Access dal 30/06/2024
Descrizione: Executive Summary
Dimensione
650.26 kB
Formato
Adobe PDF
|
650.26 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/212232