Development of a computer vision algorithm for real-time monocular camera localization in a moving vehicle for head-mounted display

This thesis presents the development of a vision-based localization system, operating at low frequency, for a head-mounted display (HMD) within a moving vehicle. Designed for augmented reality (AR) applications, this system enhances the driving experience and safety by seamlessly integrating holographic information into the real world. This research contributes to a larger project focused on accurate head localization within the vehicle, achieved through a sensor fusion approach that combines high-frequency data from Inertial Measurement Units (IMUs) with low-frequency localization from camera-based methods. The specific focus of this thesis is to develop a precise and reliable vision-based solution for the low-frequency component of this localization framework. The proposed solution is a monocular camera localization algorithm based on a tailored implementation of the Feature-to-Point (F2P) method. Unlike marker-based approaches or methods that require stereo cameras during runtime, this algorithm offers a streamlined, monocular solution that maintains high accuracy without external markers or dual-camera setups. The process begins offline with the generation of keyframes from a pre-collected video, where 3D point clouds are created and stored for each keyframe using a stereo camera, which is the only stage in which stereo vision is required. During runtime, the custom F2P implementation extracts 2D features from each frame of the real-time input video and matches them with the stored keyframe features, establishing 2D-3D correspondences that enable pose estimation through the Perspective-n-Point (PnP) algorithm. Additionally, an Optical Flow technique is integrated within a state machine framework, further enhancing the robustness of pose estimation and supporting stable operation at approximately 60 Hz.

Questa tesi presenta lo sviluppo di un sistema di localizzazione visiva a bassa frequenza per head-mounted display (HMD) in un veicolo in movimento, progettato per applicazioni di realtà aumentata (AR) volte a migliorare l’esperienza di guida e a migliorarne la sicurezza, fornendo informazioni essenziali tramite ologrammi integrati nella realtà. Il progetto si inserisce in un contesto più ampio, finalizzato a ottenere una localizzazione accurata della testa nel veicolo, combinando dati ad alta frequenza provenienti da Unità di Misura Inerziale (IMU) con dati a bassa frequenza derivati da telecamere. Il contributo specifico di questa tesi risiede nello sviluppo di una soluzione precisa e affidabile per la localizzazione a bassa frequenza, basata su un algoritmo di localizzazione con mono-camera che utilizza un'implementazione ottimizzata del metodo Feature-to-Point (F2P). A differenza di approcci basati su marker o sull’uso di stereo camere durante l’esecuzione, questo approccio innovativo con mono-camera garantisce alta precisione senza necessitare di marker esterni o configurazioni a doppia camera. Il processo ha inizio offline, con la generazione di keyframe da un video pre-registrato, durante la quale vengono creati e salvati i punti 3D di ciascun keyframe utilizzando una stereo camera, l'unica fase in cui tale strumentazione è necessaria. Durante l’esecuzione, l’algoritmo F2P presentato estrae caratteristiche 2D dai frame in tempo reale del video in input e le mette a confronto con quelle dei keyframe memorizzati, generando corrispondenze 2D-3D che consentono di stimare la posa tramite l’algoritmo Perspective-n-Point (PnP). Inoltre, una tecnica di Optical Flow è integrata in un contesto di macchina a stati, migliorando la robustezza della stima della posa e garantendo un’operatività stabile a circa 60 Hz.