Hard real-time distributed systems are increasingly deployed in safety-critical domains such as avionics, automotive, and industrial automation. Ensuring both correctness and availability in these systems poses significant challenges, since correctness also depends on meeting strict timing constraints, particularly in the presence of distributed components. While the literature provides several techniques to address hardware failures in such systems, fewer studies focus on Byzantine failures, and all of them rely on consensus-based mechanisms. This thesis investigates quorum-based approaches for hard real-time distributed systems, aiming to achieve fault tolerance while ensuring that task deadlines are met even under Byzantine failures. Unlike consensus-based methods, the proposed model enables failure detection and recovery without requiring task re-execution, thereby improving efficiency and maintaining determinism. Furthermore, we show how the quorum-based approach can also be extended to handle hardware failures. The model has been evaluated using OMNeT++, an open-source network simulator, integrated with the INET framework, which provides a comprehensive set of communication protocols, components, and network models. The experimental results highlight both the strengths and current limitations of the proposed solution, while also outlining directions for future improvements. Overall, this work contributes a novel foundation for research on fault-tolerant hard real-time distributed systems.
I sistemi distribuiti hard real-time sono sempre più diffusi in domini safety-critical come l’avionica, l’automotive e l’automazione industriale. Garantire sia la correttezza sia la disponibilità in questi sistemi rappresenta una sfida significativa, poiché la correttezza dipende anche dal rispetto di stringenti vincoli temporali, soprattutto in presenza di componenti distribuiti. Sebbene la letteratura offra diverse tecniche per gestire i guasti hardware in tali sistemi, pochi studi si concentrano sui guasti bizantini, e tutti si basano su meccanismi di consenso. Questa tesi indaga approcci basati su quorum per sistemi distribuiti hard real-time, con l’obiettivo di ottenere tolleranza ai guasti garantendo al contempo il rispetto delle deadline dei task anche in presenza di guasti bizantini. A differenza dei metodi basati sul consenso, il modello proposto consente la rilevazione e il recupero dai guasti senza richiedere la riesecuzione dei task, migliorando così l’efficienza e mantenendo il determinismo. Inoltre, mostriamo come l’approccio basato su quorum possa essere esteso anche alla gestione dei guasti hardware. Il modello è stato valutato tramite OMNeT++, un simulatore di rete open-source, integrato con il framework INET, che fornisce un insieme completo di protocolli di comunicazione, componenti e modelli di rete. I risultati sperimentali evidenziano sia i punti di forza sia le attuali limitazioni della soluzione proposta, delineando al contempo possibili direzioni per futuri miglioramenti. Nel complesso, questo lavoro contribuisce a gettare nuove basi per la ricerca su sistemi distribuiti hard real-time tolleranti ai guasti.
A quorum-based approach for hard real-time distributed systems
VILLA, MATTEO
2024/2025
Abstract
Hard real-time distributed systems are increasingly deployed in safety-critical domains such as avionics, automotive, and industrial automation. Ensuring both correctness and availability in these systems poses significant challenges, since correctness also depends on meeting strict timing constraints, particularly in the presence of distributed components. While the literature provides several techniques to address hardware failures in such systems, fewer studies focus on Byzantine failures, and all of them rely on consensus-based mechanisms. This thesis investigates quorum-based approaches for hard real-time distributed systems, aiming to achieve fault tolerance while ensuring that task deadlines are met even under Byzantine failures. Unlike consensus-based methods, the proposed model enables failure detection and recovery without requiring task re-execution, thereby improving efficiency and maintaining determinism. Furthermore, we show how the quorum-based approach can also be extended to handle hardware failures. The model has been evaluated using OMNeT++, an open-source network simulator, integrated with the INET framework, which provides a comprehensive set of communication protocols, components, and network models. The experimental results highlight both the strengths and current limitations of the proposed solution, while also outlining directions for future improvements. Overall, this work contributes a novel foundation for research on fault-tolerant hard real-time distributed systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Villa_Tesi_01.pdf
accessibile in internet per tutti
Descrizione: Testo Tesi
Dimensione
998.98 kB
Formato
Adobe PDF
|
998.98 kB | Adobe PDF | Visualizza/Apri |
|
2025_10_Executive_Summary_02.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
481.26 kB
Formato
Adobe PDF
|
481.26 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243699