On the estimation of the maximum
expected value in reinforcement learning

This thesis is about the estimation of the Maximum Expected Value of a set of random variables. Unfortunately, an unbiased estimator of this value does not exist. The two state-of-the-art estimators, the Maximum Estimator and the Double Estimator, perform well in opposite and oundary conditions. We introduce the Weighted Estimator which outperforms the other estimators in intermediate conditions. This estimator computes the Maximum Expected Value estimation as a weighted average of the sample means of the random variables, where the weights are the probability of each variable to be the maximum. After a complete analysis of its bias and variance, the proposed estimator is theoretically and empirically compared with the two state-of-the-art methods. The computation of the Maximum Expected Value plays a central role in many applications. In this thesis, we focus on the specific Reinforcement Learning scenario by showing how the proposed estimator affects the performance of several algorithms. We start by evaluating theWeighted Estimator in classical finite state-action domains. However, the most interesting problems are the ones characterized by continuos state or action space. By leveraging on Gaussian Process regression we extend the Weighted Estimator to deal with continuous state space. Finally, we provide one of the first approaches that is able to directly estimate the maximum of infinite random variables. This last contribute is tested on continuous Reinforcement Learning problems.

In questa tesi viene trattato il problema della stima del Massimo Valore Atteso di un insieme di variabili aleatorie. Sfortunatamente, non esiste uno stimatore unbiased per questo valore e i due stimatori esistenti, il Maximum Estimator e il Double Estimator, hanno buone prestazioni in condizioni limite opposte. In questa tesi introduciamo il Weighted Estimator, che ottiene migliori prestazioni degli altri stimatori in condizioni intermedie. Questo stimatore calcola il Massimo Valore Atteso come una media pesata delle medie campionarie delle variabili aleatorie, dove i pesi corrispondono alla probabilità di ogni variabile aleatoria di essere il massimo. Dopo una completa analisi del suo bias e della sua varianza, confrontiamo questo stimatore, sia teoricamente che sperimentalmente, con gli altri stimatori esistenti. Il calcolo del Massimo Valore Atteso ha un ruolo centrale in molte applicazioni; in questa tesi, ci focalizziamo sullo scenario del Reinforcement Learning mostrando come l’uso dello stimatore proposto influisce sulla performance di diversi algoritmi. Partendo dalla valutazione del Weighted Estimator in classici problemi con spazio degli stati e delle azioni discreto, estendiamo la sua applicazione a problemi con spazio degli stati e delle azioni continui. Attraverso l’utilizzo della regressione dei Gaussian Process estendiamo l’applicazione del Weighted Estimator anche a problemi di Reinforcement Learning con spazio degli stati continuo. Infine, presentiamo uno dei primi approcci che consente di stimare il Massimo Valore Atteso di infinite variabili aleatorie. Anche quest’ultimo contributo è testato in problemi di Reinforcement Learning.

On the estimation of the maximum expected value in reinforcement learning

NUARA, ALESSANDRO

2015/2016

Abstract

This thesis is about the estimation of the Maximum Expected Value of a set of random variables. Unfortunately, an unbiased estimator of this value does not exist. The two state-of-the-art estimators, the Maximum Estimator and the Double Estimator, perform well in opposite and oundary conditions. We introduce the Weighted Estimator which outperforms the other estimators in intermediate conditions. This estimator computes the Maximum Expected Value estimation as a weighted average of the sample means of the random variables, where the weights are the probability of each variable to be the maximum. After a complete analysis of its bias and variance, the proposed estimator is theoretically and empirically compared with the two state-of-the-art methods. The computation of the Maximum Expected Value plays a central role in many applications. In this thesis, we focus on the specific Reinforcement Learning scenario by showing how the proposed estimator affects the performance of several algorithms. We start by evaluating theWeighted Estimator in classical finite state-action domains. However, the most interesting problems are the ones characterized by continuos state or action space. By leveraging on Gaussian Process regression we extend the Weighted Estimator to deal with continuous state space. Finally, we provide one of the first approaches that is able to directly estimate the maximum of infinite random variables. This last contribute is tested on continuous Reinforcement Learning problems.

Scheda breve

Scheda completa

	Relatore
	
				RESTELLI, MARCELLO
			
	Correlatore/i
	
				D'ERAMO, CARLO
PIROTTA, MATTEO
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				28-lug-2016
			
	Anno accademico
	
				2015/2016
			
	Abstract in italiano
	
				In questa tesi viene trattato il problema della stima del Massimo Valore Atteso di un insieme di variabili aleatorie. Sfortunatamente, non esiste uno stimatore unbiased per questo valore e i due stimatori esistenti, il Maximum Estimator e il Double Estimator, hanno buone prestazioni in condizioni limite opposte. In questa tesi introduciamo il Weighted Estimator, che ottiene migliori prestazioni degli altri stimatori in condizioni intermedie. Questo stimatore calcola il Massimo Valore Atteso come una media pesata delle medie campionarie delle variabili aleatorie, dove i pesi corrispondono alla probabilità di ogni variabile aleatoria di essere il massimo. Dopo una completa analisi del suo bias e della sua varianza, confrontiamo questo stimatore, sia teoricamente che sperimentalmente, con gli altri stimatori esistenti.
 Il calcolo del Massimo Valore Atteso ha un ruolo centrale in molte applicazioni; in questa tesi, ci focalizziamo sullo scenario del Reinforcement Learning mostrando come l’uso dello stimatore proposto influisce sulla performance di diversi algoritmi. Partendo dalla valutazione del Weighted Estimator in classici problemi con spazio degli stati e delle azioni discreto, estendiamo la sua applicazione a problemi con spazio degli stati e delle azioni continui. Attraverso l’utilizzo della regressione dei Gaussian Process estendiamo l’applicazione del Weighted Estimator anche a problemi di Reinforcement Learning con spazio degli stati continuo. Infine, presentiamo uno dei primi approcci che consente di stimare il Massimo Valore Atteso di infinite variabili aleatorie. Anche quest’ultimo contributo è testato in problemi di Reinforcement Learning.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2016_07_Nuara.pdf non accessibile Descrizione: Testo della tesi Dimensione 726.07 kB Formato Adobe PDF Visualizza/Apri	726.07 kB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/122835

On the estimation of the maximum expected value in reinforcement learning

NUARA, ALESSANDRO

2015/2016

Abstract

Scheda breve Scheda completa

----- Informazioni -----

Conferma cancellazione

Scheda breve

Scheda completa