On the exploitation of neural networks to handle the robot-environment interaction: two control strategies

Robotic tasks that require interaction with other bodies is increasingly required in industrial contexts. The manipulators need to interact with the environment in a compliant way to avoid damage, but, at the same time, are often required to accurately track a reference force. To this aim, interaction controllers are typically employed, but the ones available in the literature either need human tinkering for tuning or precise modeling of the environment the robot will interact with. The first requires a great deal of time, while the latter is obliged to rely on approximations, which often lead to failure during the actual application. These features can be really problematic if it were often necessary to change the contact environment. An high-performance force controller that does not require tuning by the operator and that is quick to adapt regardless of the environment would be an important improvement in this field. With this aim, this work proposes two control strategies, which exploit neural networks, to handle the forces generated by the contact. A Cartesian impedance controller is designed to implement a decoupled compliant robot dynamics. Both strategies select the impedance control setpoint to achieve the best possible force tracking. The first exploits an ensemble of neural networks to estimate the force generated between the robot end-effector and the environment. This force estimator is used to correct a base force controller, which guarantees stability, optimizing the performances of the latter. The second is a change of use case of a reinforcement learning-based work, which, however, required improvements and modifications to be correctly adapted. The resultant strategy consists of an Actor-Critic Model Predictive Force Controller (ACMPFC), where the actor exploits the model approximator and the critic to select the best setpoint. The strategies have been implemented and tested in the MuJoCo dynamic simulator and on a real-case scenario, both seeing a Panda Franka Emika robot used as a test platform. A reduction in terms of Mean Squared Error is achieved by deploying the first strategy with a low setup time required. The second effectively shows good performances and fast convergence, in comparison to classical reinforcement learning strategies.

Attività robotiche che prevedono l'interazione con altri corpi sono di primaria importanza e sempre più richiesti in industria. I manipolatori devono interagire con l'ambiente in modo compliante, per evitare danni all'ambiente e al robot stesso, ma allo stesso tempo spesso seguire una forza di riferimento in modo accurato. A questo scopo, i controllori dell'interazioni sono tipicamente usati, ma quelli presenti nello stato dell'arte richiedono o una taratura di fino o una precisa modellazione dell'ambiente. La prima occupa molto tempo, mentre la seconda spesso si basa su necessarie approssimazioni, che spesso causano un degradamento delle performance nell'esecuzione dell'attività. Questo diventa ancora più problematico se è necessario cambiare spesso l'ambiente con cui il robot andrà a contatto. Un controllore di forza dalle ottime performance potrebbe essere un miglioramento chiave in questo campo, soprattutto se non richiedesse una taratura manuale di fino da parte dell'operatore e fosse veloce ad adattarsi a un nuovo ambiente. Con questo scopo in mente, in questo lavoro vengono proposte due strategie, basate sull'utilizzo di reti neurali, per stimare le forze che verranno generate durante il contatto. Un controllore Cartesiano di impedenza è stato implementato per dare al robot una dinamica disaccoppiata nelle direzioni e compliante. Entrambe le strategie si basano sul fornire al controllore di impedenza la posizione desiderata per raggiungere la forza di riferimento desiderata. La prima strategia utilizza un insieme di reti neurali per stimare la forza generata nel contatto robot-ambiente. Questa stima è utilizzata per correggere un controllore di forza di base, il quale garantisce la stabilità del sistema, e incrementarne le performance. La seconda strategia consiste in una tecnica di controllo predittivo di forza basato su approccio Actor-Critic. Le strategie sono state implementate e testate in simulazione tramite MuJoCo e in un caso reale, entrambe con un robot Panda Franka Emika. Una riduzione in termini di errore quadratico medio di tracking è stata raggiunta usando la prima strategia di controllo, che inoltre richiede un ridotto tempo di setup per essere operativa. Con la seconda invece è stata ottenuta una veloce convergenza e buone performance rispetto alle implementazioni basate su apprendimento con rinforzo presenti in letteratura.