Reinforcement learning: from theory to algorithms

This thesis is mainly based on the idea that the design of an algorithm must be supported by theoretical results. Instead of starting from an algorithm and analyzing its properties and guarantees, we will derive theoretical results and we will investigate their applicability to design practical algorithms. All the analysis will be performed in the reinforcement learning framework, i.e., a mathematical framework for learning by interaction. This thesis summarizes recent results in different reinforcement learning topics and builds on these results to provide novel algorithms with an attention to performance guarantees. The thesis is organized in 3 parts, one for each studied topic. We start investigating the recent advances in the framework of safe reinforcement learning, i.e., algorithm with strong performance guarantees. We suggest algorithms that, by exploiting a lower bound to the expected performance gain, are able to guarantee a monotonic performance improvement overtime. The idea behind this analysis is to reduce the gap with classical control theory where the most attention is based on robustness and stability. However, to face many real applications reinforcement learning must be able to handle problems with multiple objectives. The second part of the thesis deals with this topic. We present three new algorithms that exploit gradient methods to construct a (continuous or discrete) approximation of the Pareto frontier. On the top of that, we will show that the multi-objective framework is a natural way to explain the problem of inverse reinforcement learning with linear reward parametrization. We will exploit this consideration to derive algorithms (CPIRL and PGIRL) with strong error guarantees that differ for the amount of information required. The goal of inverse reinforcement learning is to recover the motivations that make an agent to behave optimally. We will present an algorithm (GIRL) that removes the assumption of linear parametrization. Moreover, we will show that PGIRL is an efficient implementation of the more general nonlinear-reward algorithm (GIRL) in the particular case of linear reward parametrizations.

Questa tesi si basa sull'idea che i risultati teorici devono fornire la base portante su cui definire la struttura di un algoritmo. Al posto di partire da un algoritmo ed analizzare le sue proprietà e garanzie, in questo lavoro ci focalizzeremo sulla derivazione di risultati teorici e sul loro utilizzo per la definizione di algoritmi pratici. Tutta questa analisi verrà eseguita nell'ambito del reinforcement learning, ovvero un set di strumenti matematici per risolvere problemi di apprendimento per interazione. Questa tesi riassume gli sviluppi più recenti in questo campo e li sfrutta per derivare nuovi algoritmi con forti garanzie teoriche. La tesi è organizzata in 3 parti, una per ogni argomento studiato. Inizieremo analizzando i recenti sviluppi nell'ambito del safe reinforcement learning, ovvero algoritmi con forti garanzie teoriche. Qui presenteremo algoritmi che, sfruttando un bound inferiore al guadagno atteso di una politica, sono in grado di garantire un miglioramento monotono della funzione obiettivo ad ogni iterazione. L'idea dietro questa analisi è quella di ridurre la distanza con la teoria classica del controllo dove l'attenzione è posta sulla robustezza e stabilità. Tuttavia, per poter affrontare problemi reali è necessario essere in grado di gestire molteplici obiettivi. Presenteremo tre nuovi algoritmi che sfruttano metodi a gradiente per costruire una approssimazione della frontiera di Pareto, sia continua che discreta. Partendo da questi risultati, mostreremo che gli approcci multi-obiettivo sono un modo naturale per spiegare problemi di inverse reinforcement learning con parametrizzazione lineare della funzione di rinforzo. Inverse reinforcement learning ambisce a recuperare le informazioni che fanno agire l'agente in modo ottimo. Useremo queste considerazioni per derivare due algoritmi (CPIRL and PGIRL) con forti garanzie teoriche. Questi algoritmi differiscono per la quantità di informazioni richieste. In aggiunta, presenteremo un algoritmo (GIRL) che rimuove l'assunzione di parametrizzazione lineare. Infine, mostreremo che PGIRL è una efficiente implementazione del più generale algoritmo GIRL con parametrizzazione lineare.

Reinforcement learning: from theory to algorithms

PIROTTA, MATTEO

Abstract

This thesis is mainly based on the idea that the design of an algorithm must be supported by theoretical results. Instead of starting from an algorithm and analyzing its properties and guarantees, we will derive theoretical results and we will investigate their applicability to design practical algorithms. All the analysis will be performed in the reinforcement learning framework, i.e., a mathematical framework for learning by interaction. This thesis summarizes recent results in different reinforcement learning topics and builds on these results to provide novel algorithms with an attention to performance guarantees. The thesis is organized in 3 parts, one for each studied topic. We start investigating the recent advances in the framework of safe reinforcement learning, i.e., algorithm with strong performance guarantees. We suggest algorithms that, by exploiting a lower bound to the expected performance gain, are able to guarantee a monotonic performance improvement overtime. The idea behind this analysis is to reduce the gap with classical control theory where the most attention is based on robustness and stability. However, to face many real applications reinforcement learning must be able to handle problems with multiple objectives. The second part of the thesis deals with this topic. We present three new algorithms that exploit gradient methods to construct a (continuous or discrete) approximation of the Pareto frontier. On the top of that, we will show that the multi-objective framework is a natural way to explain the problem of inverse reinforcement learning with linear reward parametrization. We will exploit this consideration to derive algorithms (CPIRL and PGIRL) with strong error guarantees that differ for the amount of information required. The goal of inverse reinforcement learning is to recover the motivations that make an agent to behave optimally. We will present an algorithm (GIRL) that removes the assumption of linear parametrization. Moreover, we will show that PGIRL is an efficient implementation of the more general nonlinear-reward algorithm (GIRL) in the particular case of linear reward parametrizations.

Scheda breve

Scheda completa

	Relatore
	
				BASCETTA, LUCA
			
	Coordinatore
	
				BONARINI, ANDREA
			
	Tutor
	
				LOVERA, MARCO
			
	Correlatore/i
	
				RESTELLI, MARCELLO
			
	Data
	
				18-gen-2016
			
	Abstract in italiano
	
				Questa tesi si basa sull'idea che i risultati teorici devono fornire la base portante su cui definire la struttura di un algoritmo.
Al posto di partire da un algoritmo ed analizzare le sue proprietà e garanzie, in questo lavoro ci focalizzeremo sulla derivazione di risultati teorici e sul loro utilizzo per la definizione di algoritmi pratici. Tutta questa analisi verrà eseguita nell'ambito del reinforcement learning, ovvero un set di strumenti matematici per risolvere problemi di apprendimento per interazione. Questa tesi riassume gli sviluppi più recenti in questo campo e li sfrutta per derivare nuovi algoritmi con forti garanzie teoriche.
La tesi è organizzata in 3 parti, una per ogni argomento studiato. Inizieremo analizzando i recenti sviluppi nell'ambito del safe reinforcement learning, ovvero algoritmi con forti garanzie teoriche. Qui presenteremo algoritmi che, sfruttando un bound inferiore al guadagno atteso di una politica, sono in grado di garantire un miglioramento monotono della funzione obiettivo ad ogni iterazione. L'idea dietro questa analisi è quella di ridurre la distanza con la teoria classica del controllo dove l'attenzione è posta sulla robustezza e stabilità.
Tuttavia, per poter affrontare problemi reali è necessario essere in grado di gestire molteplici obiettivi. Presenteremo tre nuovi algoritmi che sfruttano metodi a gradiente per costruire una approssimazione della frontiera di Pareto, sia continua che discreta.
Partendo da questi risultati, mostreremo che gli approcci multi-obiettivo sono un modo naturale per spiegare problemi di inverse reinforcement learning con parametrizzazione lineare della funzione di rinforzo. Inverse reinforcement learning ambisce a recuperare le informazioni che fanno agire l'agente in modo ottimo. Useremo queste considerazioni per derivare due algoritmi (CPIRL and PGIRL) con forti garanzie teoriche. Questi algoritmi differiscono per la quantità di informazioni richieste. In aggiunta, presenteremo un algoritmo (GIRL) che rimuove l'assunzione di parametrizzazione lineare. Infine, mostreremo che PGIRL è una efficiente implementazione del più generale algoritmo GIRL con parametrizzazione lineare.
			
	Tipo di documento
	
				Tesi di dottorato
			
	Appare nelle tipologie:
	
				Tesi di Dottorato

File allegati

File	Dimensione	Formato
pirotta_phd_2015.pdf accessibile in internet solo dagli utenti autorizzati Descrizione: Testo della tesi Dimensione 4.12 MB Formato Adobe PDF Visualizza/Apri	4.12 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/117511