Sim-to-real flattening of a rectangular cloth with a robotic arm through deep reinforcement learning

In robotics the manipulation of rigid objects is widespread and deeply studied, and almost all possible tasks can be directly taught. On the contrary, the handling of deformable objects is a rather unexplored field, because of the unpredictability of their behaviour and the intrinsic difficulty to operate with them. The need for extremely complex models certainly sets a limit to their investigation, but more recent Machine Learning theory finds an efficient application and allows to overcome most of the restraints through model-free approaches, which can help to develop robust and practical techniques. The present work intends to exploit Reinforcement Learning, a branch of Machine Learning, in order to build a system which can acquire the capability to flatten a towel, in the simplified case in which one side is pinned, using a single robotic manipulator arm and a vision system. The dissertation reports the sequence of steps undertaken to develop the whole process. A virtual environment is built up with a 3D physics engine, in order to perform the simulations for the manipulation of the cloth; a framework which handles and links it to the learning core is also prepared. The Deep Reinforcement Learning algorithm, which exploits Neural Networks and a gradient-based update process, autonomously learns how to behave thanks to a training phase in which experience is gained step by step, through a trial and error approach. The training is carried out for a long sequence of episodes, at first in a very simple environment, which reproduces the geometric shape of the towel, and then in the 3D-scenario, with the recreation of a deformable object. The agent learns to accomplish the task with a 75% win-rate. The model is validated experimentally in a real-world scenario, where a successful flattening is achieved in 52% of the conducted experiments, which included a large amount of intricate and complex configurations; the accomplishment rate rises to 70% for moderately complicated layouts. Possible restrictions lie in the simulated environment which allowed for a reduced initial randomisation of the towel.

Nel mondo della robotica, in cui la manipolazione di oggetti rigidi è stata indagata a fondo e quasi tutte le possibili attività riescono a essere insegnate agevolmente, la manipolazione di oggetti deformabili rimane un territorio ancora poco esplorato, a causa della loro complessità e del comportamento imprevedibile. L’elaborazione delle più recenti teorie del Machine Learning consente di superare questi limiti attraverso approcci non basati su un modello definito a priori, e permette di adottare tecniche robuste ed efficaci. Il presente lavoro intende sfruttare il Reinforcement Learning al fine di costruire un sistema in grado di imparare a distendere un asciugamano, nel caso semplificato in cui un lato sia mantenuto in posizione fissa, tramite un singolo braccio robotico e un sistema di visione computerizzata. La tesi illustra le fasi che costituiscono il processo e hanno permesso di realizzare il progetto. Un ambiente di simulazione virtuale è implementato mediante un motore fisico in 3D, al fine di riprodurre la manipolazione dell’asciugamano, ed è gestito e interfacciato all’IA attraverso un framework di collegamento. L'algoritmo di Deep Reinforcement Learning, model-free, per mezzo di reti neurali e un processo di ottimizzazione basato sul gradiente di una cost-function, apprende progressivamente e in modo autonomo a portare a termine il compito, grazie ad un approccio trial-and-error. L'addestramento viene svolto per una lunga serie di episodi, prima in un ambiente semplificato, che riproduce esclusivamente la forma geometrica dell’asciugamano, e in seguito nell’ambiente 3D sviluppato, con il panno effettivamente deformabile. L'agente impara a svolgere il task con una percentuale di successi del 75%. Il modello è infine convalidato sperimentalmente in uno scenario reale, dove il compito è eseguito correttamente nel 52% dei casi, che tuttavia includono anche un gran numero di configurazioni articolate. Il tasso di realizzazione sale al 70% nel caso di layout solo moderatamente complessi. Le possibili ragioni di questa difficoltà di esecuzione risiedono nell'ambiente di simulazione, che ha consentito una limitata randomizzazione iniziale dell’asciugamano.