Machine learning-based reentry guidance for the ReFEx Mission: analysis and comparison of reinforcement learning and genetic programming

This research applies two Machine Learning (ML) techniques, Deep Reinforcement Learning (DRL) and Genetic Programming (GP), to the task of generating the online guidance command for the REusability Flight EXperiment (ReFEx) vehicle. The focus is on the atmospheric flight phase of the reentry mission, in which the models trained via DRL and GP are used to produce real-time corrections for the reference guidance command computed beforehand. This further update stage is necessary to account for external disturbances and modeling errors. The ML models are tested and validated in the 6 Degrees Of Freedom (DOF) high fidelity simulator developed for the verification and validation of the Guidance Navigation and Control (GNC) subsystem of the ReFEx mission. Both methods are compared to each other and to the baseline optimization-based guidance algorithm to assess the performances and applicability of ML techniques to a real mission. DRL and GP are chosen for their complementary features: DRL has been used to train a Multilayer Perceptron (MLP) which results in a black-box model, whereas GP can produce a human-readable continuous and differentiable model, which is available as a symbolic expression. Results show that the DR can deliver the same performances as the baseline guidance algorithm, while the GP achieves slightly worse results still comparable to it. Moreover, both techniques feature online execution times approximately three orders of magnitude faster than the baseline optimization-based strategy.

Questa ricerca applica due tecniche di Machine Learning (ML), Genetic Programming (GP) e Deep Reinforcement Learning (DRL), al compito di generare in tempo reale il comando di guida per il velivolo ReFEx. Lo studio si concentra solamente nella parte di rientro atmosferico durante la quale, i modelli addestrati tramite le due tecniche sopraccitate, vengono usati per calcolare le correzioni da applicare in tempo reale al segnale di guida calcolato prima di iniziare la traiettoria di rientro. Tale fase di correzione è necessaria al fine di fronteggiare disturbi esterni ed errori nella modellizzazione del sistema stesso. Entrambi i modelli di ML sono stati testati e validati nel simulatore ad alta fedeltà sviluppato per la progettazione e validazione del sottosistema di Guida, Navigazione e Controllo (GNC) della missione ReFEx. I due metodi sono stati confrontati tra di loro e in relazione all'algoritmo standard sviluppato per la missione, il quale si basa su un processo iterativo di ottimizzazione, con lo scopo di valutare l'effettiva applicabilità di tali tecniche ad una missione reale, e le corrispondenti prestazioni. DRL e GP sono stati scelti per le loro caratteristiche complementari: il primo è stato usato per addestrare un Multilayer Perceptron (MLP) che di fatto si configura come un modello black-box mentre il secondo è in grado di produrre un modello continuo, derivabile ed interpretabile da un essere umano che si presenta nella forma di un'espressione simbolica. I risultati dimostrano che il DRL è in grado di fornire le stesse prestazioni dell'algoritmo di guida standard, mentre il GP produce risultati con prestazioni leggermente inferiori ma comunque comparabili. Inoltre, entrambe le tecniche sono in grado di generare il comando di aggiornamento in tempo reale impiegando circa un millesimo del tempo richiesto all'algoritmo standard.