Genetic algorithms and reinforcement learning : three studies on limits, improvements and hybridization

This thesis is about Machine Learning. Work began by trying to develop complex behaviours for a simple robotic system using Genetic Algorythms but, due to limitations of the selected techniques and the subsequent inabil- ity to achieve the desired goal, focus drifted to an analysis of the two most widely used algorithms for machine learning in the recent literature, Generic Algorithms themselves and Reinforcement Learning. Aim of this work is to propose three distinguished experimental studies on both methods, cultimating in a hybrid procedure. The first goes into more depth on a considerable limit of GAs which arises when they are applied to stochastic environments, a study that has not as of yet been addressed in the literature, despite the fact that applications of GAs to more complex environments have started to be popular. Indeed, Genetic Algorithms will be shown to be unable to learn in those settings, due to the Generalization Limit. As a result of this limit, interest moved to the more robust methods of Rein- forcement Learning, and in particular, to the DDPG algorithm. Focusing on this topic, the second study addresses one of its most renowned limitations, Off-Policy Learning, by introducing a simple by novel framework, called Pol- icy Feedback, that both increases the performance of its baseline algorithm and proves useful to better understand the reasons behind this limit. Finally, the third study tries to combine Genetic Algorithms and Reinforce- ment Learning in a single procedure, by leveraging the Policy Feedback tech- nique. As a result, two distinct hybrid algorithms are introduced. The first, GEDReL, is able to solve the Generalization Limit of its genetic component. However, it is not able to outperform state of the art distributed exploration method AE-DDPG. As a consequence, the second algorithm X-DDPG, is introduced, by com- bining AE-DDPG and GEDReL. X-DDPG is able to outperform both of its baselines while recovering the 1 good of GEDReL of the genetic side. Each of the studies in this thesis could offer support for future research. New Genetic Algorithm variants could be designed to be more robust to the Generalization Limit, when dealing with stochastic environments, like many Reinforcement Learning benchmarks. Further studies on the Policy Feedback framework could be developed, to help better the understanding of the issues behind Off-Policy Learning. Finally, X-DDPG could be improved by combining its DDPG baseline with state-of-the-art procedures like natural actor-critic or distributional policy gradient.

Questa tesi tratta di Machine Learning. Il lavoro iniziò tentando di utilizzare Algoritmi Genetici per far apprendere comportamenti complessi a sistemi robotici. A causa di limiti di questi metodi, il lavoro si è concentrato sullo studio degli algoritmi più diffusi per l’apprendimento artificiale. Gli algoritmi genetici stessi, e il Reinforcement Learning. L’obiettivo di questo lavoro è di proporre tre studi separati, uno per ogni metodo, per poi culminare nel terzo con un la formulazione di un metodo ibrido. Il primo studio, non presente in letteratura, tratta approfonditamente del limite incontrato con gli Algoritmi Genetici, quando vengono utilizzati per far apprendere comportamenti ad agenti in ambienti stocastici. In questi casi infatti, gli algoritmi genetici non sono in grado di apprendere. Il secondo studio si concentra sul Reinforcement Learning e in particolare sull’algoritmo DDPG. In particolare viene affrontato il problema dell’apprendimento Off-Policy, che da sempre è causa di problemi in questo settore, anche se le motivazioni non sono ad oggi chiare. Durante questo studio verrà formulato un algoritmo dal nome di Policy Feedback, basato su DDPG, che è in grado di superare la performance della sua baseline, oltre a fornire risultati di interesse per poter meglio comprendere le ragioni dietro questi limiti. Il terzo studio tratta di due algoritmi ibridi che vengono introdotti con questa tesi, GEDReL e X-DDPG. L’idea di GEDReL è di unire il potere esplorativo degli Algoritmi Genetici con la robustezza di DDPG. Si mostra come GEDReL, attraverso l’interazione fra i due algoritmi, sia in grado di alleviare il limite degli Algoritmi Genetici discusso nel primo studio, ma non è in grado di superare la performance dello stato dell’arte in apprendimento distribuito, AE-DDPG. Di conseguenza, X-DDPG viene introdotto, unendo GEDReL ad AE-DDPG. Si mostra come X-DDPG sia in grado di ottenere risultati competitivi, oltre a mantenere le qualità di GEDReL.