Deep monocular autocalibration of radially symmetric wide-angle cameras

Camera calibration is a fundamental step for numerous Computer Vision applications. It reveals the relationship between image pixels and the location of the real-world object that they represent. This allows us to infer object sizes in world units, make distance measurements, and determine the location of a camera in the scene. Without this information, tasks such as 3D reconstruction, autonomous driving, visual localization, image undistortion, and augmented reality would not be as successful. Recently, wide-angle cameras have been employed in those tasks, due to their ability to capture a much larger portion of the scene than that captured with standard cameras. Existing wide-angle calibration methods are typically cumbersome and expensive, as they require a careful, controlled environment to be performed, where an object with a precise geometrical structure must be present and captured multiple times under different angles by the camera. In this thesis, we propose a novel solution to the problem of Autocalibration, that provides an estimate of the camera parameters using only a single image. Our contribution is two-fold. (i) We design a new camera representation that is directly related to the content of the image and avoids the use of any explicit mathematical model. Our representation is composed of a set of correspondences between the location of an image pixel and the light ray that reaches it, hence, it can represent any camera. (ii) We devise a Deep Neural Network to estimate the proposed representation, exploiting the radial symmetry of wide-angle cameras, and a calibration algorithm to convert our representation into standard mathematical camera representations. This results in a calibration technique that can be employed for a wide range of cameras, thanks to the great expressiveness of our representation, and in a broad range of environments, since our solution is trained on a large-scale dataset. We perform extensive experiments to assess the quality of the calibration results of our method. Moreover, we compare its effectiveness against State-of-the-Art methods on images of different natures, from outdoor to indoor, from urban to natural. The results are significantly superior that those obtained by the best methods in the literature.

Avere delle camere calibrate è il requisito fondamentale per un gran numero di applicazioni di Computer Vision. Una camera calibrata rivela la relazione tra pixel dell’immagine e la posizione dell’oggetto che rappresentano nel nostro mondo. Grazie a questa relazione possiamo ricavare la dimensione di oggetti in unità metriche, misurare distanze, e determinare la posizione delle camera nella scena. Senza queste informazioni applicazioni come la ricostruzione di scene 3D, la guida autonoma, e la realtà aumentata non sarebbero cosí famose. Negli ultimi anni è cresciuto molto l’utilizzo di camere con un ampio campo visivo in questi ambiti di Computer Vision, proprio grazie alla capacità di catturare una porzione di scena notevolmente piú grande di quella catturata da camere standard. I metodi classici per la calibrazione di camere con ampio campo visivo sono spesso complicati e costosi, dato che necessitano di un ambiente preciso e accuratamente regolato per essere eseguiti con successo, che deve contenere un oggetto con una precisa forma geometrica, fotografato dalla camera da diverse angolazioni. In questa tesi proponiamo una soluzione innovativa al problema si Autocalibrazione, che si occupa di stimare i parametri di una camera da una sola immagine. Il nostro contributo è duplice. (i) Proponiamo una nuova rappresentazione della camera che è direttamente correlata con il contenuto dell’immagine e non utilizza esplicitamente nessun modello matematico. Questa rappresentazione è formata da un insieme di corrispondenze tra la posizione di un pixel nell’immagine e l’angolazione del raggio di luce che lo colpisce, e può quindi rappresentare qualsiasi camera. (ii) Sviluppiamo una Rete Neurale Deep per stimare la nostra rappresentazione, che sfrutta la simmetria radiale delle camere, e infine un algoritmo di calibrazione per convertire la nostra rappresentazione nei classici modelli matematici delle camere. La tecnica di calibrazione proposta può quindi essere utilizzata per una vasta gamma di camere, grazie alla grande espressività della nostra rappresentazione, e in qualsiasi tipo di ambiente, visto che la Rete Neurale è allenata su un dataset vasto. Per valutare l’efficacia della soluzione proposta abbiamo effettuato vari esperimenti, considerando scene di vario tipo, sia interne che esterne, sia urbane che naturali, confrontandoci con approcci che compongono lo stato dell’arte. I risultati confermano che la nostra soluzione fornisce un miglioramento significativo rispetto a metodi esistenti.