Provably efficient algorithms  for the reinforcement learning problem with expert advisors

In some reinforcement learning problems the agent may be equipped with a set of expert advisors, which may be the optimal policies associated to previously solved tasks or directives provided by humans. The problems of policy advice and policy reuse focus on improving the learning of a reinforcement learning agent, by efficiently reusing the knowledge provided by the experts. The first is simply the problem of finding the best expert policy among those provided. The second tries to improve the initial phase of the learning of an off-policy learner, by providing expert examples. First, we show how to adapt information directed sampling to the policy advice problem, and we propose a variant which explores the whole space of policies through policy gradient techniques. Then, we consider the policy reuse problem in a non-stationary stochastic bandit setting, and we show how, by making reasonable assumptions on the non-stationary return of the off-policy learner, it is possible to obtain provably efficient algorithms, with low regret with respect a dynamic oracle. The performances of the presented algorithms are then tested empirically.

In alcuni problemi di apprendimento per rinforzo, l’agente può essere equipaggiato con un gruppo di "consulenti esperti", che possono essere le politiche ottime di compiti risolti precedentemente o anche direttive fornite dall’uomo. I problemi di policy advice ("consiglio di politiche") e policy reuse ("riutilizzo di politiche"), si focalizzano sul migliorare l’apprendimento di un agente di apprendimento per rinforzo, riutilizzando efficientemente la conoscenza fornita dagli esperti. Il primo è semplicemente il problema di trovare la politica esperta migliore tra quelle fornite. Il secondo cerca di migliorare la fase di apprendimento iniziale di un algoritmo off-policy (fuori politica), fornendo esempi esperti. Innanzitutto, mostriamo come adattare il campionamento diretto da informazione al problema di policy advice, e proponiamo una variante che esplora l’intero spazio delle politche tramite tecniche di policy gradient(gradiente di politica). Poi, consideriamo il problema di policy reuse in un setting bandito stocastico non stazionario, e mostriamo come, facendo ipotesi ragionevoli sul ritorno non stazionario dell’algoritmo di apprendimento off-policy, è possibile ottenere algoritmi dimostrabilmente efficienti, con basso regret rispetto a un oracolo dinamico. Le prestazioni degli algoritmi presentati sono poi verificate empiricamente.

Provably efficient algorithms for the reinforcement learning problem with expert advisors

Taormina, Giacomo

2019/2020

Abstract

In some reinforcement learning problems the agent may be equipped with a set of expert advisors, which may be the optimal policies associated to previously solved tasks or directives provided by humans. The problems of policy advice and policy reuse focus on improving the learning of a reinforcement learning agent, by efficiently reusing the knowledge provided by the experts. The first is simply the problem of finding the best expert policy among those provided. The second tries to improve the initial phase of the learning of an off-policy learner, by providing expert examples. First, we show how to adapt information directed sampling to the policy advice problem, and we propose a variant which explores the whole space of policies through policy gradient techniques. Then, we consider the policy reuse problem in a non-stationary stochastic bandit setting, and we show how, by making reasonable assumptions on the non-stationary return of the off-policy learner, it is possible to obtain provably efficient algorithms, with low regret with respect a dynamic oracle. The performances of the presented algorithms are then tested empirically.

Scheda breve

Scheda completa

	Relatore
	
			RESTELLI, MARCELLO
		
	Correlatore/i
	
			TIRINZONI, ANDREA
		
	Scuola / Dip.
	
			ING  - Scuola di Ingegneria Industriale e dell'Informazione
		
	Data
	
			15-dic-2020
		
	Anno accademico
	
			2019/2020
		
	Abstract in italiano
	
			In alcuni problemi di apprendimento per rinforzo, l’agente può essere equipaggiato con un gruppo di "consulenti esperti", che possono essere le politiche ottime di compiti risolti precedentemente o anche direttive fornite
dall’uomo. I problemi di policy advice ("consiglio di politiche") e policy
reuse ("riutilizzo di politiche"), si focalizzano sul migliorare l’apprendimento
di un agente di apprendimento per rinforzo, riutilizzando efficientemente la
conoscenza fornita dagli esperti. Il primo è semplicemente il problema di
trovare la politica esperta migliore tra quelle fornite. Il secondo cerca di
migliorare la fase di apprendimento iniziale di un algoritmo off-policy (fuori
politica), fornendo esempi esperti. Innanzitutto, mostriamo come adattare il campionamento diretto da informazione al problema di policy advice,
e proponiamo una variante che esplora l’intero spazio delle politche tramite tecniche di policy gradient(gradiente di politica). Poi, consideriamo il
problema di policy reuse in un setting bandito stocastico non stazionario,
e mostriamo come, facendo ipotesi ragionevoli sul ritorno non stazionario
dell’algoritmo di apprendimento off-policy, è possibile ottenere algoritmi dimostrabilmente efficienti, con basso regret rispetto a un oracolo dinamico.
Le prestazioni degli algoritmi presentati sono poi verificate empiricamente.
		
	Appare nelle tipologie:
	
			Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
tesi.pdf solo utenti autorizzati dal 27/11/2021 Descrizione: Tesi di laurea magistrale Dimensione 2.6 MB Formato Adobe PDF Visualizza/Apri	2.6 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/170841

Provably efficient algorithms for the reinforcement learning problem with expert advisors

Taormina, Giacomo

2019/2020

Abstract

Scheda breve Scheda completa

----- Informazioni -----

Conferma cancellazione

Scheda breve

Scheda completa