Deep reinforcement learning to enhance fly-around guidance for uncooperative space objects smart imaging

Driven by several potential applications, leading space agencies are increasingly investing in the gradual automation of the space missions. Autonomous flight operations may be a key enabler for large scale-sustainable on-orbit servicing, assembly and manufacturing (OSAM) missions, carrying several inherent benefits, including cost and risk reduction. Within the spectrum of proximity operations, this work focuses on autonomous path-planning for the reconstruction of geometry and inertia properties of an uncooperative target. The autonomous navigation problem is called active SLAM (Simultaneous Localization and Mapping) problem, and it has been largely studied within the field of robotics. Active SLAM problem may be formulated as a Partially Observable Markov Decision Process (POMDP), based on the idea that an agent may be able to simultaneously determine its location within the environment and plan the best route to gather mission information. Previous works in astrodynamics have demonstrated that is possible to use Reinforcement Learning (RL) techniques to teach an agent that is moving along a pre-determined orbit when to collect measurements to optimize a given mapping goal. In this work, different RL methods are explored to develop an artificial intelligence agent capable of planning sub-optimal paths for autonomous shape reconstruction of an unknown and uncooperative object via imaging. Proximity orbit dynamics are linearized and include orbit eccentricity. The geometry of the target object is rendered by a polyhedron shaped with a triangular mesh. Artificial intelligent agents are created using both Deep Q-Network (DQN) and the most recent and promising Advantage Actor Critic (A2C) method. State-action value functions are approximated using Artificial Neural Networks (ANN) and trained according to RL principles. The core of this analysis is the training of the RL agent architecture under fixed or random initial environment conditions. The agent has to learn also to monitor the Sun orientation that is essential to achieve a good quality of the map. A large database of training tests has been collected. They show promising performance in reaching extended coverage of the target (computed as the total number of successful observations for each face of the polyhedron mesh). In particular, the selected RL agents display higher mapping performance than agents that behave randomly. This work, therefore, preliminary demonstrates the applicability of RL to autonomous imaging of an uncooperative space object, thus setting a baseline for future works on this innovative field of the space engineering.

Negli ultimi anni le principali agenzie spaziali hanno incrementato gli investimenti nella graduale automazione delle missioni spaziali, in modo da rendere possibili la sostenibilità in larga scala di missioni di manutenzione, assemblaggio o produzioni in orbita, generando benefici, come la riduzione di costi e rischi. Nello spettro delle possibili operazioni di prossimità, questo lavoro si concentra sulla pianificazione autonoma di una traiettoria finalizzata alla ricostruzione della geometria di un oggetto non-cooperativo. Il problema di navigazione autonoma è chiamato active SLAM (Mappatura e Localizzazione Simultanea), largamente studiato nel campo della robotica. Esso può essere matematicamente formulato come un Processo Decisionale di Markov Parzialmente Osservabile (POMDP). Alcuni lavori sviluppati in astrodinamica hanno dimostrato che è possibile utilizzare tecniche di Reinforcement Learning (RL) per insegnare ad un agente, che si muove luogo orbite predeterminate, quando raccogliere le misurazioni necessarie per completare la missione di mappatura. In questo lavoro, alcuni metodi di RL sono utilizzati per sviluppare un’intelligenza artificiale capace di pianificare traiettorie sub-ottimali per la ricostruzione autonoma della forma di un oggetto sconosciuto e non-cooperativo attraverso tecniche di image processing. La dinamica orbitale di prossimità è linearizzata e include l’eccentricità. La geometria dell’oggetto è data da un poliedro modellato attraverso mesh triangolare. Gli agenti sono creati utilizzando sia la tecnica di Deep Q-Network (DQN) che il più nuovo e promettente metodo di Advantage Actor-Critic (A2C). La funzione stato-azione è approssimata utilizzando Reti Neurali Artificiali (ANN), il cui training avviene in accordo con i principi del RL. Il cuore di questa analisi è appunto il processo di training di un agente sotto condizioni iniziali fissate o casuali. L’agente deve inoltre imparare a monitorare l’orientazione del Sole, parametro essenziale per il raggiungimento di una buona qualità nella mappa. Un importante database è stato raccolto: i test mostrano performance promettenti nel raggiungimento di una copertura estesa dell’oggetto. In particolare, gli agenti dimostrano che il livello di mappatura raggiunto è superiore rispetto a quello di agenti che, al contrario, lavorano in maniera randomica. Questo lavoro preliminare dimostra come sia possibile applicare tecniche di RL all’imaging autonomo di oggetti spaziali non-cooperativi, generando, di conseguenza, una base da cui sviluppare lavori futuri in questo campo innovativo dell’ingegneria spaziale.