In this thesis, we conduct research about the application of methods that fall under the Learning from Demonstration (LfD) paradigm. Such a paradigm is used to speed up the learning process of the Reinforcement Learning (RL) algorithms. In particular, such an approach is crucial in the application of RL to real-world problems, since in those scenarios the acquisition of new samples is commonly expensive (in terms of time or actual costs). We applied the framework to two well-known RL algorithms, i.e., Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC), to evaluate the effectiveness of the proposed approach. We test the proposed approach on simulated benchmark environments and discussed the cases in which the approach is providing a significant improvement over the classical RL approach.
In questa tesi, abbiamo condotto una ricerca riguardo l’applicazione di metodi che ricade sotto al paradigma di Learning from Demonstration (LfD). Questo paradigma è usato per velocizzare il processo di apprendimento di algoritmi di Reinforcement Learning (RL). In particolare, questo approccio è cruciale nell’applicare RL a problemi nel mondo reale, dato che in questi scenari l’acquisizione di nuovi campioni è comunemente dispendioso (in termini di tempo o di costo effettivo). Abbiamo applicato il framework a due algoritmi di RL noti, cioè Deep Deterministic Policy Gradient (DDPG) e Soft Actor-Critic (SAC) per valutare l’efficacia dell’approccio proposto. Abbiamo testato il metodo proposto in degli ambienti simulati di riferimento e discusso i casi dove l’approccio porta un vantaggio significativo rispetto a un approccio classico di RL.
Integrating behavioral cloning into a reinforcement learning pipeline
D'Silva, Andrea
2022/2023
Abstract
In this thesis, we conduct research about the application of methods that fall under the Learning from Demonstration (LfD) paradigm. Such a paradigm is used to speed up the learning process of the Reinforcement Learning (RL) algorithms. In particular, such an approach is crucial in the application of RL to real-world problems, since in those scenarios the acquisition of new samples is commonly expensive (in terms of time or actual costs). We applied the framework to two well-known RL algorithms, i.e., Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC), to evaluate the effectiveness of the proposed approach. We test the proposed approach on simulated benchmark environments and discussed the cases in which the approach is providing a significant improvement over the classical RL approach.File | Dimensione | Formato | |
---|---|---|---|
Thesis_Andrea_DSilva.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Tesi completa
Dimensione
1.25 MB
Formato
Adobe PDF
|
1.25 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary_Andrea_DSilva.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
446.96 kB
Formato
Adobe PDF
|
446.96 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/208354