3D position estimation using deep learning

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

The estimation of the 3D position of an object is one of the most critical topics in the computer vision field. Where the final aim is to create automated solutions that can localize and detect objects from images, new high-performing models and algorithms are needed. Due to lack of relevant information in the single 2D images, approximating the 3D position of an object can be considered a complex problem. The single specific task of estimating the 3D position of a soccer ball has been investigated. This thesis describes a method based on two deep learning models: the ball net and the temporal net that can tackle this task. The former is a deep convolutional neural network with the intention to extract meaningful features from the images, while the latter exploits the temporal information to reach a more robust prediction. This solution reaches a better Mean Absolute Error compared to already existing computer vision methods on different conditions and configurations. A new data-driven pipeline has been created to deal with videos and extract the 3D information of an object.

La stima della posizione 3D di un oggetto può essere considerata uno degli aspetti più importanti nell’ambito intelligenza artificiale. Dove l’obiettivo finale è quello di creare soluzioni automatizzate in grado di localizzare e rilevare oggetti dalle immagini, è neces- sario lo sviluppo di nuovi modelli e algoritmi ad alte prestazioni. A causa della mancanza di informazioni rilevanti nelle singole immagini 2D, l’approssimazione della posizione 3D di un oggetto può essere considerato un problema complesso. Il singolo studio relativa- mente alla stima della posizione 3D di una palla da calcio è stata investigata in modo specifico. Questa tesi descrive un metodo basato su due modelli neurali: ball net e temporal net. La prima è una rete neurale convoluzionale con l’intenzione di estrarre caratteristiche significative dalle immagini, mentre la seconda sfrutta le informazioni temporali per ottenere una stima più accurata. Questa soluzione raggiunge un errore assoluto medio migliore rispetto ai metodi di computer vision esistenti in condizioni e configurazioni diverse. Una nuova data-driven pipeline è stata creata per gestire video ed estrarre le informazioni 3D di un oggetto.

3D position estimation using deep learning

PEDRAZZINI, FILIPPO

2017/2018

Abstract

The estimation of the 3D position of an object is one of the most critical topics in the computer vision field. Where the final aim is to create automated solutions that can localize and detect objects from images, new high-performing models and algorithms are needed. Due to lack of relevant information in the single 2D images, approximating the 3D position of an object can be considered a complex problem. The single specific task of estimating the 3D position of a soccer ball has been investigated. This thesis describes a method based on two deep learning models: the ball net and the temporal net that can tackle this task. The former is a deep convolutional neural network with the intention to extract meaningful features from the images, while the latter exploits the temporal information to reach a more robust prediction. This solution reaches a better Mean Absolute Error compared to already existing computer vision methods on different conditions and configurations. A new data-driven pipeline has been created to deal with videos and extract the 3D information of an object.

Scheda breve

Scheda completa

	Relatore
	
				BORACCHI, GIACOMO
			
	Correlatore/i
	
				BOMAN, MAGNUS
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				25-lug-2018
			
	Anno accademico
	
				2017/2018
			
	Abstract in italiano
	
				La stima della posizione 3D di un oggetto può essere considerata uno degli aspetti più importanti nell’ambito intelligenza artificiale. Dove l’obiettivo finale è quello di creare soluzioni automatizzate in grado di localizzare e rilevare oggetti dalle immagini, è neces- sario lo sviluppo di nuovi modelli e algoritmi ad alte prestazioni. A causa della mancanza di informazioni rilevanti nelle singole immagini 2D, l’approssimazione della posizione 3D di un oggetto può essere considerato un problema complesso. Il singolo studio relativa- mente alla stima della posizione 3D di una palla da calcio è stata investigata in modo specifico. Questa tesi descrive un metodo basato su due modelli neurali: ball net e temporal net. La prima è una rete neurale convoluzionale con l’intenzione di estrarre caratteristiche significative dalle immagini, mentre la seconda sfrutta le informazioni temporali per ottenere una stima più accurata. Questa soluzione raggiunge un errore assoluto medio migliore rispetto ai metodi di computer vision esistenti in condizioni e configurazioni diverse. Una nuova data-driven pipeline è stata creata per gestire video ed estrarre le informazioni 3D di un oggetto.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
FilippoPedrazzini-thesis.pdf accessibile in internet per tutti Dimensione 11.38 MB Formato Adobe PDF Visualizza/Apri	11.38 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/141805