With the introduction of Convolutional Neural Networks (CNN), we have witnessed large advancements in accuracy and precision in the fields of object detection and 6D pose estimation from RGB images. We investigate the use of CNNs to verify their applicability for perception tasks in the fields of industrial and collaborative robotics. In particular, we devise a method for generating realistic training datasets for objects in a predetermined environment starting from photographs, greatly facilitating the usually laborious and expensive data acquisition phase that is considered to be a pre-requisite for machine learning applications. We then trained a neural network on various experimental datasets of this sort to evaluate its performance. We also devised an approach to extrapolate the semantic state of a scene from the ouputs of a pose estimation network. Finally, we demonstrated the performance of our methods in a real-world scenario by using the output of a trained neural network to plan the movement of a robotic manipulator.
L'introduzione delle Reti Neurali Convoluzionali (CNN) ha dato il via ad enormi sviluppi nei campi dell'identificazione di oggetti e stima della posa 6D partendo da immagini a colori. In questa tesi studiamo l'utilizzo delle CNN per verificarne l'applicabilita' alla percezione nei campi della robotica industriale e collaborativa. In particolare, presentiamo un metodo per generare dataset realistici per allenare reti neurali, facilitando notevolmente il laborioso processo di acquisizione dati richiesto dal machine learning. Avendo poi allenato una rete neurale per verificarne la precisione, sviluppiamo un metodo per estrapolare lo stato semantico di una scena utilizzando le stime della posa fornite dalla rete. Finalemente, dimostriamo l'affidabilita' dei nostri metodi in uno scenario reale, utilizzandone i risultati per pianificare il movimento di un manipolatore robotico.
Pose estimation and semantic meaning extraction for robotics using neural networks
FIGUNDIO, DAVIDE
2021/2022
Abstract
With the introduction of Convolutional Neural Networks (CNN), we have witnessed large advancements in accuracy and precision in the fields of object detection and 6D pose estimation from RGB images. We investigate the use of CNNs to verify their applicability for perception tasks in the fields of industrial and collaborative robotics. In particular, we devise a method for generating realistic training datasets for objects in a predetermined environment starting from photographs, greatly facilitating the usually laborious and expensive data acquisition phase that is considered to be a pre-requisite for machine learning applications. We then trained a neural network on various experimental datasets of this sort to evaluate its performance. We also devised an approach to extrapolate the semantic state of a scene from the ouputs of a pose estimation network. Finally, we demonstrated the performance of our methods in a real-world scenario by using the output of a trained neural network to plan the movement of a robotic manipulator.File | Dimensione | Formato | |
---|---|---|---|
Thesis.pdf
accessibile in internet per tutti
Descrizione: Caricato 16/04/2023
Dimensione
27.67 MB
Formato
Adobe PDF
|
27.67 MB | Adobe PDF | Visualizza/Apri |
Extended Abstract.pdf
accessibile in internet per tutti
Descrizione: Caricato 16/04/2023
Dimensione
4.08 MB
Formato
Adobe PDF
|
4.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/210363