Continuous control actions learning with performance specifications through reinforcement learning

Robots are nowadays increasingly required to deal with (partially) unknown tasks. The robot has, therefore, to adapt its behavior to the specific operating conditions. Reinforcement learning (RL) holds the promise of autonomously learning new control policies through interaction with the environment. However, RL approaches are prone to learning with high samples, particularly for continuous control problems. In this work, a learning-based method is presented that leverages simulation data to learn continuous control actions (for both the robot joints and the gripper's fingers) for an object manipulation task through RL. The control policy is parameterized by a neural network and learned using two model-free RL algorithms. The learning performance is compared across on-policy and off-policy algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). A dense reward function is designed for the task to enable efficient learning of an agent. The main objectives of the proposed reward function are: success in grasping and lifting of the target object, robot redundancy management avoiding to reach joint limits, smoothing control actions for the implementation of the learned controller on a real robot, collisions avoidance considering the obstacles in the scene. The proposed approach is trained entirely in simulation (exploiting the MuJoCo environment) without any prior knowledge about the task. A grasping task involving a Franka Emika Panda manipulator is considered as the reference task to be learned. The task requires the robot to reach the part, grasp it, and lift it off the contact surface, satisfying the above defined objectives embedded into the reward function. The proposed approach is demonstrated to be generalizable across multiple object geometries and initial robot/parts configurations, having the ability to learn and re-execute the (partially) modified task. The experimental tests are finally performed on a real Franka Emika Panda robot, showing the possibility to transfer the learned behavior from simulation. Experimental results show 100 % of successful grasping tasks, making the proposed approach applicable to real applications.

I sistemi robotici sono sempre più utilizzati in applicazioni in cui devono interagire con ambienti non noti a priori e dove il task da svolgere non è più completamente pre-programmato. Di conseguenza, i manipolatori devono essere in grado di adattarsi alle specifiche condizioni operative. Algoritmi di Reinforcement Learning (RL) consentono un apprendimento autonomo di nuove policy di controllo attraverso l’interazione con l’ambiente, necessitando però di una enorme mole di dati per completare l’apprendimento. In questa Tesi di Laurea, una metodologia basata su RL viene proposta per l’apprendimento delle azioni di controllo continue da comandare al sistema robotico per l’esecuzione di un task di riferimento. Tale apprendimento viene eseguito in simulazione e viene poi trasferito al sistema robotico reale. Il task di riferimento è relativo alla manipolazione di componenti, in cui le azioni di controllo dei giunti del robot e della pinza devono essere definite e apprese. La policy di controllo è parametrizzata attraverso l’utilizzo di una rete neurale. L’apprendimento proposto è stato realizzato attraverso due differenti tecniche di RL: Proximal Policy Optimization (PPO; “on-policy”) e Soft Actor-Critic (SAC; “off-policy”). Una funzione di reward è stata definita per la guida dell’apprendimento. I principali obiettivi dell’apprendimento sono: successo nell’afferraggio del componente, successo nel sollevamento del componente, gestione della ridondanza del robot evitando i limiti di giunto, evitare azioni di controllo discontinue, evitare collisioni con l’ambiente in cui opera il robot. L’approccio è stato applicato in simulazione utilizzando il software MuJoCo. Successivamente, il task è stato trasferito sul robot Franka EMIKA panda. L’approccio si è dimostrato generalizzabile a diverse condizioni iniziali del task e a diversi componenti da manipolare, consentendo inoltre di soddisfare le specifiche addizionali richieste. Il trasferimento del task al robot reale è stato infine eseguito con successo, mostrando il 100 % di task con successo.