Learning from expert demonstrations on F1 simulators, with transfer learning across different vehicle setups

Autonomous driving has been one of the main topics in Artificial Intelligence now at least for decades. In this thesis, we will experiment with autonomous driving agents for racing car simulators, thus focusing on the pure problem of optimizing driving in terms of racing performances, without the logistic, legal and ethical questions that famously characterize the field of autonomous driving. The project encompasses two different simulator environments: the Ferrari F1 simulator, on which we will perform purely batch analyses on a collection of laps performed by expert pilots; and the TORCS open source racing car simulator. We applied various Reinforcement Learning approaches to the problem of estimating an optimal driving policy, starting with value based methods, to arrive to policy search methods combined with imitation learning. We also experimented on the topic of Transfer Learning across vehicle configurations, with the goal of reusing data collected for different car setups to speed-up the learning process of a new target configuration. We show the analyses of the obtained policies, highlighting the limits of value based methods applied to continuous space problems, in particular their shortcomings in producing a smooth and accurate behaviour for the steering wheel. We then compare the models and performances obtained with transfer learning methods based on importance sampling to those obtained with direct approaches.

La guida autonoma è una delle principali materie di studio dell’intelligenza artificiale ormai almeno da decenni. In questa tesi presenteremo degli esperimenti relativi alla guida autonoma per simulatori di corse, concentrandoci quindi sul problema della guida pura, in termini di performance sportiva, senza legami con i problemi logistici, legali ed etici che storicamente contraddistinguono il campo della guida autonoma. Il nostro progetto si svolge all’interno di due diversi contesti: il simulatore di corse Ferrari, su cui ci limiteremo a delle analisi batch su una collezione di giri eseguiti da piloti professionisti; e il simulatore di corse open source TORCS. Abbiamo applicato vari approcci di Reinforcement Learning al problema di imparare un politica di guida ottima, iniziando con dei metodi value based, per arrivare a metodi di policy search combinati con l’imitation learning. Abbiamo inoltre sperimentato nell’ambito del Transfer Learning tra diverse configurazione dei veicoli, con lo scopo di riusare, e sfruttare i dati collezionati con vetture con diversi settaggi per accelerare l’apprendimento di una politica di guida con una nuova vettura target. Nel capitolo finale mostreremo le analisi delle politiche ottenute, evidenziando i limiti degli approcci value based applicati a problemi su spazi continui, in particolare la loro inadeguatezza nel generare profili di sterzata omogenei. Compareremo infine i modelli e i risultati ottenuti tramite Transfer Learning, basati sull’importance sampling, con quelli ottenuti tramite apprendimento diretto.