Reinforcement Learning (RL) addresses the problem of training an agent to act in an unknown environment maximizing a payoff signal. The agent needs to proactively explore the environment to understand its structure, its dynamics, its own capabilities to interact with it and how to benefit the most from such interaction. In other words, the amount of useful information available to the agent depends completely on its ability to get that information for itself autonomously. The purpose of this thesis is to propose a set of techniques, based on Thompson Sampling (TS), that drive the exploration exploiting the uncertainty of the agent over its current beliefs about the optimality of the choices available. We discuss the problem of exploration in RL, we analyze some of the solutions available in the literature and, building upon this knowledge, we propose novel methods to address this problem in environments of different complexity, using different classes of algorithms and approximators, from tabular solutions to deep neural networks. Finally, we analyze the performance of the proposed techniques in problems of increasing difficulty, starting from a simple maze to challenging videogames.
L'Apprendimento per Rinforzo (RL) affronta il problema di addestrare un agente a muoversi in un ambiente sconosciuto massimizzando un segnale di ricompensa. L'agente deve attivamente esplorare l'ambiente così da comprenderne la struttura, le possibilità di interazione e come trarne beneficio. In altre parole, i dati necessari all'apprendimento devono essere generati come parte del processo di apprendimento stesso. Lo scopo di questa tesi è presentare un insieme di tecniche, basate su Thompson Sampling (TS), che guidano l'esplorazione sfruttando l'incertezza dell'agente riguardo le sue stime correnti sulla ottimalità delle azioni a lui disponibili. Discuteremo in dettaglio il problema dell'esplorazione in RL, analizzando una selezione delle soluzioni al problema disponibili nella letteratura e, basandoci su tali conoscenze, proporremo nuovi metodi. Presenteremo varie procedure iterative per stimare l'incertezza in RL e usarla per derivare strategie di campionamento efficienti, ognuna delle quali adatta ad ambienti di complessità diversa, usando differenti classi di algoritmi e modelli, da soluzioni tabulari a reti neurali profonde. Infine, valuteremo le tecniche proposte in problemi di difficoltà crescente, partendo da un semplice labirinto fino a complessi videogiochi.
Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning
CINI, ANDREA
2017/2018
Abstract
Reinforcement Learning (RL) addresses the problem of training an agent to act in an unknown environment maximizing a payoff signal. The agent needs to proactively explore the environment to understand its structure, its dynamics, its own capabilities to interact with it and how to benefit the most from such interaction. In other words, the amount of useful information available to the agent depends completely on its ability to get that information for itself autonomously. The purpose of this thesis is to propose a set of techniques, based on Thompson Sampling (TS), that drive the exploration exploiting the uncertainty of the agent over its current beliefs about the optimality of the choices available. We discuss the problem of exploration in RL, we analyze some of the solutions available in the literature and, building upon this knowledge, we propose novel methods to address this problem in environments of different complexity, using different classes of algorithms and approximators, from tabular solutions to deep neural networks. Finally, we analyze the performance of the proposed techniques in problems of increasing difficulty, starting from a simple maze to challenging videogames.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis.pdf
accessibile in internet per tutti
Descrizione: Testo della tesi
Dimensione
1.65 MB
Formato
Adobe PDF
|
1.65 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/142942