Feasibility study of a deep learning-based monocular pose estimation software on a space-graded hardware

Monocular pose estimation of an uncooperative target is a crucial issue in computer vision, as it has the potential to enable autonomous navigation systems. Monocular systems are particularly intriguing due to their minimal resource requirements, making them well-suited for space applications. This technology holds great potential for operations like In-Orbit Servicing and Active Debris Removal, where a service spacecraft must conduct close-proximity operations with a passive target. In recent years, there has been a surge in the popularity and rapid evolution of Deep Learning techniques, leading to extensive research focused on leveraging these methods for the aforementioned problem. However, there is limited research on deploying large deep learning models on space-graded hardware, which often has limited computational capability. Therefore, the goal of this thesis is to develop a deep learning-based monocular pose estimation pipeline and test its performance on a low performing hardware, specifically a Raspberry Pi 4. Initially, two different models—one for object detection and one for landmarks regression—of the popular Convolutional Neural Network architecture YOLOv8 are trained and tested on a labeled dataset of the Tango spacecraft relative pose. The smallest size models of the YOLOv8 family are chosen to enable real-time inference on the Raspberry's low computational power CPU and to determine if a small network can provide reliable measurements. Following the training, the two networks are incorporated into a Two-Stage Pose Estimation pipeline. In this pipeline, after obtaining the spacecraft keypoints coordinates in the image plane, the EPnP algorithm is utilized to calculate the camera's relative position and orientation with respect to the target. The program is developed in C++ using the OpenVINO framework to deploy the two networks. It is capable of estimating pose with an error of less than 20 cm for position and 2° for orientation, with approximately only 3 million parameters per network, achieving a speed of 2.7 frames per second when utilizing the Raspberry Pi to its full potential. However, the pipeline puts a significant load on the CPU, increasing its temperature by 20°C and utilizing nearly 100% of its total computational power. This underscores the current limitations in deploying deep learning-enhanced pose estimation software on board a spacecraft. The developed code is open-source and available at https://github.com/masgura/On-board-pose-estimation.

La stima della posa monoculare di un target non cooperative è un importante problema in computer vision, in quanto ha con il potenziale di realizare sistemi di navigazione autonoma. Le camere monoculari sono particolarmente intiganti a causa delle loro ridotte esigenze di risorse, rendendoli ben adatti per applicazioni in spazio. Questa tecnologia ha grande potenziale per operazioni come In-Orbit Servicing e Active Debris Removal, dove un satellite deve condurre operazioni di prossimità con un target passivo. Negli ultimi anni, c'è stato un aumento della popolarità e una rapida evoluzione delle tecniche di Deep Learning, portando a una vasta ricerca focalizzata sull'utilizzo di questi metodi per il problema sopra menzionato. Tuttavia, la ricerca sull’implementazione di grandi modelli di deep learning su hardware di livello spaziale, che spesso ha capacità computazionali molto basse, è limitata. Perciò, l'obiettivo di questa tesi è creare un programma per stima della posa basato sul deep learning e testare le sue performance su un edge device, nello specifico un Raspberry Pi 4. Inizialmente, due diversi modelli - uno per identificazione dell'oggetto ed uno per regressione dei punti di riferimento - della popolare architettura di Reti Neurali Convoluzionali YOLOv8 vengono addestrati e testati su un dataset di posa relativa del satellite Tango. I modelli di dimensioni più piccole della famiglia YOLOv8 sono scelti per permettere l'inferenza in tempo reale sulla CPU a bassa potenza computazionale del Raspberry e per determinare se una piccola rete può fornire misurazioni affidabili. Dopo l'addestramento, le due reti sono incorporate in una pipeline di stima della posa a due stadi. In questa pipeline, dopo aver ottenuto le coordinate dei punti chiave del satellite nel piano dell'immagine, viene utilizzato l'algoritmo EPnP per calcolare la posizione e l'orientamento relativi della fotocamera rispetto al bersaglio. Il programma è sviluppato in C++, utilizzando il framework OpenVINO per implementare le due reti. È in grado di stimare la posa con un errore inferiore a 20 cm per la posizione e 2° per l'orientamento, con circa solo 3 milioni di parametri per rete, raggiungendo una velocità di 2.7 fotogrammi al secondo quando si utilizza il Raspberry Pi al massimo delle sue potenzialità. Tuttavia, la pipeline pone un carico significativo sulla CPU, aumentando la sua temperatura di 20°C e utilizzando quasi il 100% della sua totale capacità computazionale. Questo sottolinea le attuali limitazioni nell'implementazione di software di stima della posa basati su deep learning a bordo di un satellite. Il codice sviluppato è open source e disponibile in https://github.com/masgura/On-board-pose-estimation.