The digital transformation driven by Industry 4.0 places the Digital Twin (DT) at the center of simulation, prediction, and optimization of physical systems. The effectiveness of a DT depends on its ability to remain synchronized with the real system, balancing predictive accuracy, which requires frequent and costly updates, with operational cost reduction, which favors less frequent updates but may degrade forecast quality. This thesis addresses the problem of predictive synchronization of DTs in unreliable production lines, characterized by random machine failures and repairs. The problem is formalized in terms of state-dependent policies and uses the Sample Path method as a reference. To overcome its limitations, the research reformulates the problem as a Markov Decision Process (MDP), enabling the use of Reinforcement Learning (RL) techniques. Model-based approaches, such as Value Iteration (VI) as a theoretical benchmark, and model-free methods, such as SARSA and Q-Learning (QL), were implemented on a simplified line model with alternating machines and buffers. The results show that VI provides a robust reference, highlighting an inverse relationship between synchronization frequency and bias cost, stable for intermediate costs. SARSA did not produce coherent policies, while Q-Learning demonstrated flexibility. Among conservative, balanced, and aggressive configurations, the first replicates the Sample Path with higher costs, the second reacts better to prediction errors, and the third maximizes accuracy at the expense of operational costs, proving most effective in dynamic environments. Comparisons show that Q-Learning significantly reduces bias under varying conditions, albeit with higher synchronization frequency, while Sample Path offers stable performance in well-defined scenarios. Tests on the bottleneck machine’s position and efficiency confirm these findings. This thesis demonstrates the potential of RL methods to improve synchronization efficiency and support the development of adaptive DTs, with future possibilities including the extension to Deep RL and adaptive reward functions for more complex systems.
La trasformazione digitale guidata dall’Industria 4.0 pone il Digital Twin (DT) al centro della simulazione, previsione e ottimizzazione dei sistemi fisici. L’efficacia di un DT dipende dalla capacità di rimanere sincronizzato con il sistema reale, bilanciando accuratezza predittiva, che richiede aggiornamenti frequenti e costosi, con riduzione dei costi operativi, che implicando aggiornamenti meno frequenti, può degradare la qualità delle previsioni. Questa tesi affronta la sincronizzazione predittiva dei DT in linee di produzione, fatte di macchine inaffidabili, caratterizzate da guasti e riparazioni casuali, alternate a buffer limitati. Il problema è formalizzato in termini di politiche state-dependent e utilizza come riferimento il metodo Sample Path. Per superarne i limiti, la ricerca lo riformula come Processo Decisionale di Markov (MDP), consentendo l’uso di tecniche di Reinforcement Learning (RL). Sono stati implementati approcci model-based, come la Value Iteration (VI) come benchmark teorico, e metodi model-free, come SARSA e QLearning (QL). I risultati mostrano che VI fornisce un riferimento robusto, evidenziando una relazione inversa tra frequenza di sincronizzazione e costo del bias, stabile per costi intermedi. SARSA non ha prodotto politiche coerenti, mentre Q-Learning ha mostrato flessibilità. Tra le configurazioni conservative, bilanciate e aggressive, la prima replica il Sample Path con costi più elevati, la seconda reagisce meglio agli errori di previsione e la terza massimizza la precisione a scapito dei costi, risultando più efficace in ambienti dinamici. Il confronto evidenzia che Q-Learning riduce significativamente il bias in condizioni variabili, seppur con maggiore frequenza di sincronizzazione, mentre il Sample Path offre prestazioni stabili in scenari ben definiti. I test sulla posizione e sull’efficienza della macchina collo di bottiglia confermano questi risultati. La tesi dimostra i limiti e il potenziale dei metodi di RL nel migliorare l’efficienza della sincronizzazione e nello sviluppo di DT adattivi, con possibilità future di estendere l’approccio al Deep RL per sistemi più complessi.
Digital twin prediction update synchronization problem of unreliable production lines: new reinforcement learning approach
Pederzoli, Michele
2024/2025
Abstract
The digital transformation driven by Industry 4.0 places the Digital Twin (DT) at the center of simulation, prediction, and optimization of physical systems. The effectiveness of a DT depends on its ability to remain synchronized with the real system, balancing predictive accuracy, which requires frequent and costly updates, with operational cost reduction, which favors less frequent updates but may degrade forecast quality. This thesis addresses the problem of predictive synchronization of DTs in unreliable production lines, characterized by random machine failures and repairs. The problem is formalized in terms of state-dependent policies and uses the Sample Path method as a reference. To overcome its limitations, the research reformulates the problem as a Markov Decision Process (MDP), enabling the use of Reinforcement Learning (RL) techniques. Model-based approaches, such as Value Iteration (VI) as a theoretical benchmark, and model-free methods, such as SARSA and Q-Learning (QL), were implemented on a simplified line model with alternating machines and buffers. The results show that VI provides a robust reference, highlighting an inverse relationship between synchronization frequency and bias cost, stable for intermediate costs. SARSA did not produce coherent policies, while Q-Learning demonstrated flexibility. Among conservative, balanced, and aggressive configurations, the first replicates the Sample Path with higher costs, the second reacts better to prediction errors, and the third maximizes accuracy at the expense of operational costs, proving most effective in dynamic environments. Comparisons show that Q-Learning significantly reduces bias under varying conditions, albeit with higher synchronization frequency, while Sample Path offers stable performance in well-defined scenarios. Tests on the bottleneck machine’s position and efficiency confirm these findings. This thesis demonstrates the potential of RL methods to improve synchronization efficiency and support the development of adaptive DTs, with future possibilities including the extension to Deep RL and adaptive reward functions for more complex systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Pederzoli.pdf
accessibile in internet per tutti
Descrizione: Testo della tesi
Dimensione
3.38 MB
Formato
Adobe PDF
|
3.38 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_Pederzoli_Executive_Summary.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
415.89 kB
Formato
Adobe PDF
|
415.89 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243123