Enhancing visual competencies for tool affordances in a humanoid robot

Robots have become an increasingly common presence in our daily life, due to the recent leaps in technology. An important goal of robotics is to endow robots with the ability to operate autonomously and to cooperate with humans. An essential prerequisite to achieve this target is the ability to interact with objects present in the environment. In order to do that, the robot has to be able to recognize and manipulate them. This work presents a computer vision architecture, which interacts with a pre-existing tool affordances system, in order to provide the robot with the ability to observe a scene, recognize and classify tools through a 2D image segmentation algorithm. Furthermore, by means of an RGB-D camera, the 3D model of the objects is reconstructed which allows to extract various features such as shape, pose and dimensions. The algorithm is also able to find the best part to grasp, select it as the handle, and calculate its position. The extracted features are sent to the affordances system which returns the most suitable tool for the chosen action. This object, thanks to the 3D position of its handle, can be grasped autonomously and the required action can be performed. The architecture was validated and its effectiveness was confirmed by the results achieved both in the simulator and on the real robot. The analysis encompasses the matches obtained during the computation of the 3D model and the correct identification of the handle in different positions and orientations of the tools. The architecture developed and the results reported in this thesis show that enhancing visual competencies endows the robot with the ability to perform actions in a more generic and autonomous way and to adapt to increasingly realistic environments.

I robot sono diventati una presenza sempre più comune nella nostra vista quotidiana, grazie ai progressi compiuti dalla tecnologia. Un obiettivo importante della robotica è dotare i robot della capacità di operare in modo autonomo e di cooperare con gli esseri umani. Un prerequisito essenziale per raggiungere questo obiettivo è la capacità di interagire con gli oggetti contenuti nell'ambiente. Per fare ciò il robot deve essere in grado di riconoscerli e manipolarli. Questo lavoro presenta un'architettura di computer vision, che interagisce con un sistema preesistente di affordances, al fine di fornire al robot la capacità di osservare una scena, riconoscere e classificare gli strumenti attraverso un algoritmo di segmentazione di immagini 2D. Inoltre, tramite una telecamera RGB-D, viene ricostruito il modello 3D degli oggetti che permette di estrarre varie caratteristiche, come forma, posa e dimensioni. L'algoritmo è anche in grado di trovare la parte migliore da afferrare, selezionarla come manico e calcolarne la posizione. Le caratteristiche estratte vengono inviate al sistema di affordances che restituisce lo strumento più adatto per l’azione desiderata. Questo oggetto, grazie alla posizione 3D del suo manico, può essere afferrato in modo autonomo e l'azione richiesta può essere eseguita. L'architettura è stata validata e la sua efficacia è stata confermata dai risultati conseguiti sia nel simulatore che sul robot reale. Le analisi riguardano le corrispondenze ottenute durante il calcolo del modello 3D e la corretta identificazione del manico nelle diverse posizioni e orientamenti degli strumenti. L'architettura sviluppata e i risultati riportati in questa tesi dimostrano che il potenziamento delle competenze visive conferisce al robot la capacità di compiere azioni in modo più generico e autonomo, e di adattarsi ad ambienti sempre più realistici.