Boosted fitted Q-iteration

In the recent years we have seen a general technological advance in the field of Artificial Intelligence and Machine Learning. Very often machine learning requires enormous computational resources and big datasets: our works aims to provide a method which uses efficiently computational resources and data. This method belongs to the class of semi-supervised machine learning algorithms called Reinforcement Learning. The goal of Reinforcement Learning is to provide a technique to solve problems where there is an agent surrounded by an environment which it can either observe and interact by means of actions. More in details the agent should maximize a reward signal, which depends by the actual state and performed actions: such signal often indicates how well the agent is behaving. Our method belongs more precisely to the class of algorithms named Approximate Value Iteration, where the core is to estimate the value associated to each pairs of state-action, in such the way that it will be possible to choose actions that maximizes the reward signal. Our method Boosted Fitted Q-Iteration (B-FQI) rely on a solid theoretical foundation, and adapt a preexisting and well known in literature method of supervised learning, to the case of reinforcement learning: boosting. Boosting allows using simpler models which require less computational resources, and which dynamically adapts its complexity to the desired target. Our algorithm introduces efficiently boosting in reinforcement learning. We will provide here a theoretical analysis that will concerns both on how the error propagates during through iterations, and on how error depends both by the choice of the expressivity of the functional space and the size of the dataset. We eventually provide the empirical results, which supports our theoretical statements for what concerns the ability of B-FQI to use simpler models with respect to the generic FQI, and we provides also an analysis from the data-efficiency point of view. B-FQI results to be efficient for what concerns the usage of computational resource and deserves undoubtedly further studies beyond the ones developed in this thesis.

Recentemente abbiamo assistito ad un consistente avanzamento nell’ambito dell’Intelligenza Artificiale e nel Machine learning, ossia l’apprendimento automatico. Spesso le tecniche di machine learning richiedono enormi risorse computazionali e giganteschi datasets: il nostro lavoro si propone di fornire un metodo che usi le risorse computazionali e i dati in maniera efficiente. Tale metodo si colloca nella classe di algorithmi di machine learning semi-supervisionato chiamata Reinforcement Learning. Il reinforcement learning si occupa di risolvere problemi dove vi è un agente immerso in un mondo che può osservare e nel quale può effettuare azioni che ne modificano lo stato. Più precisamente, l’agente deve massimizzare nel tempo un segnale di “reward” che dipende dallo stato e dalle azioni intraprese dall’agente. Il nostro metodo rientra più precisamente nella classe di algoritmi Approximate Value Iteration, dove il cuore dell’algoritmo è quello di stimare il valore associato ad ogni coppia di stato azione, in modo tale che sia poi possibile scegliere le azioni che si rivelano migliori. Il nostro metodo Boosted Fitted Q-Iteration (B-FQI) poggia su una solida struttura teorica, e introduce il boosting in modo efficiente nel reinforcement learning. Il boosting consente di utilizzare dei modelli più semplici che richiedono minori risorse computazionali e che adattano dinamicamente la loro complessità in base al target desiderato. Verrà fornita in questa sede una valutazione teorica, affrontando sia l’aspetto di come l’errore si propaga iterazione dopo iterazione, sia di come l’errore dipenda dalla corposità del dataset e dall’espressività dello spazio funzionale. Infine riportiamo i risultati sperimentali, che supportano le argomentazioni teoriche per quanto riguarda la capacità di B-FQI di usare modelli più semplici rispetto al generico AVI, e fornendo anche un analisi dal punto di vista dell’efficienza nell’uso dei dati. B-FQI risulta efficiente per quanto riguarda l’uso di risorse computazionali, e merita sicuramente ulteriori studi oltre quelli effettuati nella tesi presente.