Semantic-aware sampling for robust multi-model fitting

Robust multi-model fitting is the task of finding a series models that best fit a set of data corrupted by noise and outliers. This is an ubiquitous problem in Computer Vision where organizing unstructured visual data points in high level geometric structures is a necessary and basic step to derive better descriptions and understanding of a scene. The sampling phase is a crucial step in robust multi-model fitting, therefore, in this work, we present a novel sampling strategy, termed semantic-aware sampling. In Computer Vision, visual data comes from pictures or frames of a video sequence, but state-of-the-art robust estimators are agnostic about the visual semantics of the points, they just treat visual data as geometric locations in an abstract space. On the contrary, we exploit the information that input points have been extracted from one or multiple pictures. This enhancement of the sampling process improves the performance of robust estimators while reducing the number of required iterations. We propose to analyse the images by combining two approaches: a hand- crafted approach, where we extract a set of corresponding points, and a data-driven approach, where we obtain a probability map, termed semantics, that guides the sampling toward promising regions containing foreground objects rather than background. Experiments show that this simple yet powerful approach significantly reduces the error of state-of-the-art robust estimators, thus improving model estimation.

Il robust multi-model fitting consiste nel trovare i modelli che meglio descrivono un insieme di punti corrotti da rumore e valori anomali. Questo è un problema onnipresente nella Computer Vision in cui organizzare i punti di un’immagine secondo strutture geometriche di alto livello è un passaggio fondamentale per descrivere e comprendere meglio l’immagine. La fase di campionamento è un passaggio fondamentale nella stima robusta dei modelli, per tale motivo, in questo elaborato, viene presentata una nuova strategia di campionamento, denominata semantic-aware sampling. In Computer Vision, i dati visuali provengono da immagini o frame di un video, ma gli stimatori robusti presenti nello stato dell’arte sono agnostici rispetto alla semantica dei punti, considerano solo la loro posizione su uno spazio astratto. Al contrario, il semantic-aware sampling sfrutta l’informazione che i punti sono stati estratti da una o più immagini. Migliorare il campionamento accresce le prestazioni degli stimatori robusti e riduce il numero di iterazioni richieste. Si propone di analizzare le immagini combinando due approcci: un approccio hand-crafted, in cui viene estratto un insieme di corrispondenze tra i punti delle immagini, e un approccio data-driven, in cui si ottiene una mappa di probabilità, denominata semantica, che guida il campionamento verso regioni promettenti che contengano gli oggetti di interesse piuttosto che lo sfondo. Gli esperimenti dimostrano che questo semplice ma potente metodo riduce significativamente l’errore degli stimatori presenti nello stato dell’arte migliorando così la stima dei modelli.