On the sample complexity of inverse reinforcement learning

Reinforcement Learning is one of the most promising research directions in the field of Artificial Intelligence. It provides techniques for solving sequential decision-making problems that satisfy specific properties. In many cases, the presence of an expert, an agent that knows how to optimally solve the given problem, allows to teach to our agent how to behave by imitating the expert. Inverse Reinforcement Learning defines a powerful family of algorithms for solving the Imitation Learning problem. The main idea behind Inverse Reinforcement Learning is that our artificial intelligent agent can learn not merely by copying the actions of the expert, but by firstly studying and analyzing the reasons and the motivations that make the expert behave in that way. In other words, this discipline allows, in specific settings, to retrieve the goal of the expert, and then, to use it to understand which is the best strategy that our agent can apply in order to achieve that goal. In most cases, the strategy of the expert and the dynamics of the environment are not explicitly provided, but they have to be estimated based on expert demonstrations. In this thesis, I focus on the number of samples that must be collected in order to solve the Inverse Reinforcement Learning problem in an acceptable way according to a certain index of performance. The analysis conducted is a worst-case analysis, namely it aims to compute the minimum number of samples that any algorithm, even the best one, needs to collect in the worst possible problem instance to provide an acceptable solution. One main setting is considered that assumes the presence of an oracle that can provide samples for any possible configuration of states where the agent is located and actions taken. The results obtained allow to characterize the sample complexity of Inverse Reinforcement Learning according to the problem definition adopted in this thesis.

L'Apprendimento per Rinforzo è, ad oggi, una delle direzioni di ricerca più promettenti nell'ambito dell'Intelligenza Artificiale. Esso infatti fornisce tecniche per risolvere specifici problemi in cui un agente deve prendere sequenze di decisioni nel tempo. Spesso, la presenza di un esperto, cioè un agente che sa come risolvere il problema in maniera ottimale, ci permette di insegnare al nostro agente come prendere le decisioni semplicemente imitandolo. L'Apprendimento per Rinforzo Inverso definisce una consistente famiglia di algoritmi per risolvere il problema dell'Apprendimento per Imitazione. L'idea principale dell'Apprendimento per Rinforzi Inverso è che il nostro agente può imparare non soltanto semplicemente copiando le azioni dell'esperto, ma studiando e analizzando le motivazioni che guidano l'esperto nelle sue decisioni. In altre parole, questa disciplina permette, in particolari problemi, di recuperare l'obiettivo dell'esperto e poi di usarlo per capire qual è la miglior strategia che il nostro agente può mettere in pratica per raggiungerlo. In questa tesi, mi concentro sul numero di campioni che devono essere raccolti per risolvere il problema dell'Apprendimento per Rinforzo Inverso in maniera accettabile secondo un certo indice per misurare le prestazioni. L'analisi condotta è un'analisi del caso pessimo, cioè mira a calcolare il minimo numero di campioni che ogni algoritmo, persino il migliore, deve raccogliere nell'istanza di problema peggiore per fornire una soluzione accettabile. La configurazione principale che viene considerata è quella in cui è presente un oracolo che fornisce campioni per qualsivoglia combinazione di stati in cui l'agente può trovarsi e azioni che può prendere. I risultati ottenuti permettono di caratterizzare la complessità campionaria dell'Apprendimento per Rinforzo Inverso secondo la definizione di problema adottata in questa tesi.