Risk averse trees for learning from logged bandit feedback

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

On-line learning algorithm are common in digital online systems and the interactions of such systems can be stored in logs at little cost. In this work we develop a risk--adverse greedy algorithm LCBT for batch learning from log data collected by an on--line learning algorithm working in a bandit setting. LCBT builds a decision tree using statistical bounds and provides stationary worst case optimal policy, as well as a lower bound on its expected performance on future data. The tree structure can be easily interpreted by humans and can reveal useful information about the underlying nature of the application domain. After having introduced and described the problem setting, we present and analyze the algorithm in details. Finally, we provide the results of a wide testing campaign, comparing the performance of LCBT with state-of-the-art competitors and showing its results when applied on real-world datasets.

Gli algoritmi di apprendimento automatico sono ormai diffusi in molti sistemi digitali e ciò che avviene in questi sistemi può essere salvato in log con facilità. In questo lavoro sviluppiamo un algoritmo greedy avverso al rischio, chiamato LCBT, che è in grado di imparare analizzando log prodotti da algoritmi che operano in un ambiente di tipo bandit. LCBT costruisce un albero decisionale usando buond statistici e fornisce una policy stazionaria che ottiene buoni risultati anche negli scenari più avversi. L'algoritmo calcola anche una stima delle performance della policy proposta. La struttura ad albero può facilmente essere interpretata dall'uomo e può rivelare informazioni utili sulla natura del contesto da cui i dati sono stati ottenuti. Dopo aver introdotto e descritto il problema generale, presentiamo e analizziamo l'algoritmo in dettaglio. Infine mostriamo i risultati dei test effettuati, sia comparando le performance di LCBT con l'attuale stato dell'arte che applicando l'algoritmo a dati provenienti da problemi reali.

Risk averse trees for learning from logged bandit feedback

SIMONE, PAOLO

2015/2016

Abstract

On-line learning algorithm are common in digital online systems and the interactions of such systems can be stored in logs at little cost. In this work we develop a risk--adverse greedy algorithm LCBT for batch learning from log data collected by an on--line learning algorithm working in a bandit setting. LCBT builds a decision tree using statistical bounds and provides stationary worst case optimal policy, as well as a lower bound on its expected performance on future data. The tree structure can be easily interpreted by humans and can reveal useful information about the underlying nature of the application domain. After having introduced and described the problem setting, we present and analyze the algorithm in details. Finally, we provide the results of a wide testing campaign, comparing the performance of LCBT with state-of-the-art competitors and showing its results when applied on real-world datasets.

Scheda breve

Scheda completa

	Relatore
	
				RESTELLI, MARCELLO
			
	Correlatore/i
	
				PALADINO, STEFANO
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				28-set-2016
			
	Anno accademico
	
				2015/2016
			
	Abstract in italiano
	
				Gli algoritmi di apprendimento automatico sono ormai diffusi in molti sistemi digitali e ciò che avviene in questi sistemi può essere salvato in log con facilità. In questo lavoro sviluppiamo un algoritmo greedy avverso al rischio, chiamato LCBT, che è in grado di imparare analizzando log prodotti da algoritmi che operano in un ambiente di tipo bandit. LCBT costruisce un albero decisionale usando buond statistici e fornisce una policy stazionaria che ottiene buoni risultati anche negli scenari più avversi. L'algoritmo calcola anche una stima delle performance della policy proposta. La struttura ad albero può facilmente essere interpretata dall'uomo e può rivelare informazioni utili sulla natura del contesto da cui i dati sono stati ottenuti. Dopo aver introdotto e descritto il problema generale, presentiamo e analizziamo l'algoritmo in dettaglio. Infine mostriamo i risultati dei test effettuati, sia comparando le performance di LCBT con l'attuale stato dell'arte che applicando l'algoritmo a dati provenienti da problemi reali.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2016_09_Simone.pdf non accessibile Descrizione: Testo della tesi Dimensione 833.57 kB Formato Adobe PDF Visualizza/Apri	833.57 kB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/126281