In the context of Tiny Machine Learning, the adoption of Deep Reinforcement Learning (DRL) has been severely restricted due to the expensive computational demand of such algorithms. Consequently, many application fields have been unable to take advantage of the recent progress in Deep Reinforcement Learning. In this work, it is proposed an applicative instance of a DRL algorithm employing a Convolutional Neural Network (CNN) as a policy, designed to solve a physical time-varying tilting maze controlled by electric actuators, on a cheap, off-the-shelf Microcontroller unit (MCU). After the training, the parameters of the policy network were quantized to an 8-bits encoding to fit into the embedded memory of the MCU at the cost of an acceptable performance loss. The trained policy networks achieved encouraging results, with win rates between 87% and 99% depending on the difficulty of the task and the size of the network, with low inference times of about 10 milliseconds. These results, in principle, having satisfied the necessary requirements, enable autonomous maze solving for the physical prototype, from sensing to decision making to actuation.
Nell'ambito del ‘Tiny Machine Learning’ (Tiny ML), l'adozione del ‘Deep Reinforcement Learning’ (DRL) è stata fortemente limitata a causa degli onerosi costi computazionali che tali algoritmi richiedono. Conseguentemente, molti campi applicativi non hanno potuto beneficiare dei recenti progressi in ambito di DRL. In questo lavoro viene proposto un esempio applicativo di algoritmo DRL che utilizza come ‘policy’ una Rete Neurale Convoluzionale, progettato per risolvere un labirinto fisico inclinabile regolato da attuatori elettrici su un economico, generico microcontrollore. Al termine dell'allenamento, i parametri della rete neurale sono stati quantizzati a una codifica ad 8 bit, in modo che la memoria integrata del microcontrollore potesse contenere il modello, al costo di una accettabile riduzione prestazionale. I modelli testati hanno raggiunto risultati incoraggianti, con tassi di vittoria compresi tra l'$87%$ e il $99% $ in base alla difficoltà dell'obiettivo e alle dimensioni della rete, con bassi tempi di inferenza, intorno ai 10 millisecondi. Tali risultati, in principio, avendo rispettato i requisiti necessari, rendono possibile la risoluzione autonoma del labirinto da parte del prototipo fisico, a partire dall'acquisizione delle informazioni (‘sensing’), all'elaborazione delle decisioni, fino all'attuazione.
Solving time-varying maze with deep reinforcement learning for tiny devices
Colella, Stefano
2021/2022
Abstract
In the context of Tiny Machine Learning, the adoption of Deep Reinforcement Learning (DRL) has been severely restricted due to the expensive computational demand of such algorithms. Consequently, many application fields have been unable to take advantage of the recent progress in Deep Reinforcement Learning. In this work, it is proposed an applicative instance of a DRL algorithm employing a Convolutional Neural Network (CNN) as a policy, designed to solve a physical time-varying tilting maze controlled by electric actuators, on a cheap, off-the-shelf Microcontroller unit (MCU). After the training, the parameters of the policy network were quantized to an 8-bits encoding to fit into the embedded memory of the MCU at the cost of an acceptable performance loss. The trained policy networks achieved encouraging results, with win rates between 87% and 99% depending on the difficulty of the task and the size of the network, with low inference times of about 10 milliseconds. These results, in principle, having satisfied the necessary requirements, enable autonomous maze solving for the physical prototype, from sensing to decision making to actuation.File | Dimensione | Formato | |
---|---|---|---|
Thesis.pdf
non accessibile
Dimensione
2.85 MB
Formato
Adobe PDF
|
2.85 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/196439