Bridging the sim-to-Real Gap: self-supervised trajectory planning via visual perception for robotic deburring

This thesis investigates the Reality Gap (RG) between simulated and real environments using as case study the robotic deburring of shoe soles. This process involves robots identifying and removing unwanted rubber protrusions formed during the casting (burrs) of soles. When approaches based on learning are of interest, effective agent training is hindered by real-world data scarcity and extensive training episodes. Simulations offer a solution, but transferring learned policies to real robots often results in performance degradation, highlighting the need for Simulation to Reality techniques to address the RG challenge. This work extends the RG analysis, typically observed in Reinforcement Learning (RL), to other domains. It focuses on training visual networks ("Learn to See" phase) and acting networks ("Learn to Act" phase) for robotic deburring. Visual networks, such as Conditional Generative Adversarial Networks(cGANs) and Autoencoders(AEs), detect and compress burrs into meaningful information from sole images. Acting networks, including a regressor and a simplified RL agent, i.e. a Contextual Bandit(CB), translate this visual information into deburring actions. To enhance the learning autonomy, a self-supervised data generation approach leverages the results of the "Learn to See" phase. A custom algorithm, CAMIL (Coordinates Alignment for Mapping Image Lines), is developed to map into deburring trajectories the identified burrs profiles. These paths leverage the results of the "Learn to See" phase, offering a potential straightforward solution to the deburring problem. However, their utility extends beyond immediate application. They can also be employed as a further learning element to explore the Sim2Real challenge in autonomous agent learning, by using them as autonomous ground truth data to train the acting networks in the "Learn to Act" phase, eliminating the need for additional data. Overall, this thesis addresses the overarching "Learn to Transfer" challenge by bridging the RG through the various learning stages. The research demonstrates that the trained autonomous networks and agents exhibit consistent performance in simulated and real environments.

Questa tesi affronta il divario tra realtà e simulazione, in inglese Reality Gap (RG), nel contesto della sbavatura robotizzata delle suole di scarpe. Questo è un processo in cui i robot identificano e rimuovono sporgenze di gomma indesiderate (bave) formatesi durante lo stampaggio delle suole. Negli approcci basati sull’apprendimento automatico, l’allenamento efficace di agenti è limitato dalla scarsità di dati reali e dall’alto numero di episodi richiesto. Le simulazioni offrono una soluzione, ma il trasferimento delle politiche apprese alla realtà spesso peggiora le prestazioni, evidenziando la necessità di tecniche note in inglese come Sim2Real, per affrontare la sfida del RG. Questa ricerca estende l’analisi del problema del RG, tipicamente osservato nel Reinforcement Learning (RL), ad altri domini di apprendimento. Nello specifico, si concentra sull’addestramento di reti visive (fase "Learn to See") e reti di azione (fase "Learn to Act") per la sbavatura robotizzata. Reti visive come Conditional Generative Adversarial Networks (cGANs) e Autoencoders (AEs) rilevavano le bave e vettorizzano le informazioni sulle loro caratteristiche. Le reti di azione, un regressore e un semplice agente RL, cioè un Contextual Bandit (CB), traducono questi dati visivi in azioni di sbavatura. Per incrementare l’autonomia della fase di apprendimento, è stato adottato un metodo auto-supervisionato (Self-Supervised Learning) che sfrutta i risultati della fase "Learn to See". É stato sviluppato un algoritmo, CAMIL (Coordinates Alignment for Mapping Image Lines), per mappare in traiettorie di sbavatura i contorni delle bave identificate. Queste traiettorie non solo offrono una potenziale diretta soluzione al problema della sbavatura, ma offrono anche nuove opportunità per esplorare la sfida Sim2Real nell’addestramento autonomo degli agenti, potendo essere usate come riferimento per allenare le reti di azione nella fase "Learn to Act". Nel complesso, questa tesi affronta la sfida "Learn to Transfer" colmando il RG attraverso le diverse fasi di apprendimento. La ricerca dimostra che queste reti e agenti autonomi allenati presentano prestazioni omogenee in ambienti simulati e reali.