6 degrees of freedom pose estimation with differentiable rendering

Six Degrees of Freedom pose estimation is a crucial task in computer vision. It consists in obtaining the parameters that identify the translation and rotation of an object with respect to a system of coordinates. This task is prominent in several fields such as: robot manipulation, autonomous driving, scene reconstruction, augmented reality as well as aerospace. Usually, the two prevailing methods used to tackle this task are: a direct regression of the object’s pose from the input image, and regression of the keypoints of an object using an input image followed by a Perspective-n-Point algorithm to obtain the correct pose of the object. These methods have shown great results in different areas, but both present some drawbacks. Usually the direct regression of a pose is done through a deep neural network which requires a lot of data to be correctly trained. Instead, the regression of key points requires costly annotation, and not many publicly available datasets provide them. In this work we propose a new method to address this task using differentiable rendering: first, we reconstruct the 3D model of an object with a differentiable rendering technique. Then, we use this information to enrich our dataset with new images and useful annotations, and regress a first estimation of the six Degrees of Freedom. Finally, we refine this coarse pose with a render-and-compare approach using differentiable rendering. We tested our method on ESA's Pose Estimation Challenge using the SPEED dataset. Our approach achieve competitive results both on the benchmark challenge and in enhancing the performance of existing state of the art algorithms.

La stima della posizione nei sei gradi di libertà è una componente cruciale nell'ambito della Computer Vision. L'obiettivo di questa procedura è di stimare i parametri che indicano la traslazione e la rotazione di un oggetto rispetto a un sistema di coordinate. Questa procedura è fondamentale per molti ambiti, quali: manipolazione robotica, guida autonoma, ricostruzione di scene, realtà aumentata e in ambito aerospaziale. Solitamente, i due metodi classici di regressione dei sei gradi di libertà sono: regressione diretta dei parametri tramite una rete neurale a partire dall'immagine, e regressione della posizione dei punti chiave dell'oggetto rappresentato seguita dall'applicazione di un algoritmo Perspective-n-Point per ottenere i parametri di traslazione e rotazione. Questi metodi hanno dimostrato la loro efficacia in diverse aree, ma entrambi presentano alcuni svantaggi. La regressione diretta di rotazione e traslazione tramite rete neurale richiede moltissimi dati per ottenere un risultato solido. Invece, la regressione dei punti chiave dell'oggetto richiede annotazioni difficili da ottenere, non presenti quindi in molti dataset disponibili. In questo lavoro proponiamo un nuovo metodo per ottenere questi valori utilizzando un renderer differenziabile: per prima cosa ricostruiamo il modello 3D di un oggetto trammite un renderer differenziabile. Dopodiché, usiamo questo modello per arricchire il nostro dataset con nuove immagini e annotazioni, e regredire una prima stima dei sei gradi di libertà. Infine, miglioriamo questa stima con un approccio render-and-compare sfruttando un renderer differenziabile. Abbiamo testato questo metodo sulla Pose Estimation Challenge della ESA, utilizzando il dataset SPEED. Il nostro approccio ha ottenuto risultati competitivi sia nella sfida benchmark, sia nel migliorare un algoritmo esistente allo stato dell'arte.