Collaborative robot scheduling based on reinforcement learning in industrial assembly tasks

In classical automation, control systems are frequently guided by PLC logics. In this sequential process, the problem of how to choose when two or more actions are simultaneously available may arise. To overcome this problem, precedence rules are normally used. Sometimes, the definition of these rules is based on a priori knowledge of the system. Most often, they rely on intuition or implement simple tie-breaking rules with no clear foundation. In the thesis we solve this problem exploiting the tools that the fourth-generation industrial revolution, called Industry 4.0, offer. We face it in a human-robot collaboration (HRC) domain, in which humans and robots work together to achieve a common goal, and we compute a solution using reinforcement learning (RL) techniques. From a trial-and-error interaction with the environment, these techniques learn the "best" action to execute among the simultaneously available ones. To validate these techniques we have designed a use case that consists in an industrial assembly task. The manufacturing task is modeled as a Markov Decision Process to which two RL algorithms are applied with the aim of learning the optimal scheduling. Specifically, we have analysed the behavior and the performance of the Q-Learning with averaged reward and the Delayed Q-Learning in three scenarios of increasing complexity. Then, the manufacturing task performed following the optimal scheduling is evaluated through standard industrial metrics. In the third scenario, in which the optimal scheduling is learnt ex-novo with the maximum amount of flexibility, the learning phase duration proves to be not suitable for an effective utilization in industry. Hence, to speed up the learning, we have developed an application that converts a drawing of a manufacturing task workflow into its digital twin, which simulates the interaction between the agent and the environment. Finally, in this simulation-based RL framework, we have used the two RL algorithms to compute a static and a dynamic operation assignment, whose adaptability is tested in the face of a non-stationary human behavior.

Nell'automazione classica i sistemi di controllo sono spesso gestiti con una logica PLC. In questo processo sequenziale può sorgere il problema di come scegliere tra due o più azioni contemporaneamente abilitate. Per risolverlo, normalmente, vengono utilizzate delle regole di precedenza. Qualche volta la definizione di queste regole è basata su una conoscenza a priori del sistema, più spesso si fa affidamento sull'intuizione o su semplici meccanismi "tie-breaking" senza nessun chiaro fondamento. Nella tesi utilizziamo i nuovi strumenti offerti dalla quarta rivoluzione industriale, chiamata Industria 4.0, per risolvere tale problema. Esso viene affrontato nell'ambito della robotica collaborativa, dove gli umani e i robot lavorano assieme per raggiungere uno scopo comune. La soluzione viene trovata utilizzando tecniche di Reinforcement Learning, che, tramite un approccio a tentativi e analisi dei successivi feedback, apprendono qual è la miglior azione da eseguire tra quelle contemporaneamente abilitate. In sintesi, tale apprendimento segue una logica del "sbagliando si impara". Per validare queste tecniche abbiamo simulato una tipica lavorazione industriale, ovvero l'assemblaggio di un prodotto. L' assemblaggio viene modellato come un Markov Decision Process, sul quale applichiamo due algoritmi di reinforcement learning allo scopo di imparare lo scheduling ottimo delle azioni. Nello specifico abbiamo analizzato il comportamento e le prestazioni del Q-Learning con una funzione ricompensa mediata e del Delayed Q-Learning in tre scenari caratterizzati da una complessità crescente. Dunque, l'assemblaggio effettuato con lo scheduling ottimo è valutato tramite delle metriche industriali standard. Nel terzo scenario, dove lo scheduling è appreso ex-novo e con il massimo livello di flessibilità, il tempo di apprendimento ha dimostrato di non essere adatto per un'effettivo utilizzo industriale. Quindi, per velocizzarne l'apprendimento, abbiamo sviluppato un'applicazione che converte il workflow di una lavorazione manifatturiera nel suo digital twin, dove viene simulata l'interazione tra agente ed ambiente. In conclusione, data questa struttura del tipo Simulation-based Reinforcement Learning, abbiamo utilizzato i due algoritmi per definire lo scheduling delle azioni in maniera statica e dinamica, testando l'adattabilità a fronte di un comportamento umano non stazionario.