A reinforcement learning approach to adversarial team games

The interest around environments characterized by the simultaneous presence of multiple rational agents that interact between them and with the environment, trying to maximize their outcomes has been gaining an increasing interest in the last years. This can be justified by the ubiquitous cases of applications of Artificial Intelligences in many real world scenarios that can significantly benefit from development in the theoretical aspects of the field. In environments like these, the ability to coordinate the strategies between cooperating players becomes fundamental in order for the team to maximize his utility. Two possible solutions to such problems come from Game Theory and Reinforcement Learning, two families of techniques that are differentiated respectively by adopting a theoretically oriented approach versus a modelfree one. The tradeoff is given by the choice of having guarantees in terms of convergence to an equilibrium but with limited scalability or obtaining an high scalability of the algorithms but without the possibility of obtaining guarantees offered by the theoretical framework of Game Theory. We first give a mathematical formulation of the type of situations in which we operate, known as Adversarial Team Games. Then we offer a point of view on the comparison between Algorithmic Game Theory and Reinforcement Learning by surveying the most famous solution techniques proposed by the both disciplines, paying particular attention to their scalability and theoretical properties. Inspired by the described algorithms and by the theoretical notions of equilibria in Adversarial Team Games, we introduce SIMS, a framework for the computation of joint average strategies from a buffer of collected experiences that exploits moder deep learning solutions and can be combined with any RL technique. Contextually we also propose a novel approach to the common paradigm of centralized training and decentralized execution, justified by some game theoretic insights. Finally, the framework is compared with other RL techniques

L’interesse attorno allo studio di situazioni in cui vi `e una simultanea presenza di molteplici agenti razionali, i quali interagiscono tra loro e con l’environment con l’obiettivo di massimizzare il loro profitto, sta crescendo significativamente negli ultimi anni. Questo può essere spiegato dalla sempre pi`u comune applicazione di tecniche di Intelligenza Artificiale per applicazioni pratiche di tutti i giorni, molte delle quali sono intrinsicamente multi-agente. In environments di questo tipo, la capacità di coordinare le azioni tra un gruppo di agenti cooperanti diventa di fondamentale importanza al fine di massimizzare il profitto del team. Due possibili modalità di soluzione di un problema di questo tipo sono dati dalla Teoria dei Giochi e dal Reinforcement Learning, due tecniche che si differenziano rispettivamente nell’adozione di un approccio pi`u orientato ad un’analisi teorica la prima e model-free il secondo. Il tradeoff viene dato dalla scelta di ottenere delle garanzie di convergenza ad un equilibrio con limitata scalabilità contrapposta a quella di avere dei vantaggi in termini di scalabilità degli algoritmi, ma rinunciando ad un formalismo teorico che offre le basi per avere garanzie di raggiungere le soluzioni che vengono definite come equilibri. In questo lavoro di tesi viene fornita dapprima una formulazione matematica del tipo di giochi affrontati, che prendono il nome di Adversarial Team Games. Successivamente viene offerto un punto di vista sul confronto tra Teoria dei Giochi Algoritmica e Reinforcement Learning, descrivendo e confrontando diversi algoritmi appartenenti all’una e all’altra, con particolare attenzione ad aspetti legati a scalabilità e proprietà di convergenza. Infine, ispirato dalle tecniche descritte, viene proposto SIMS, un framework per il calcolo di strategie medie combinate tra membri di un team, a partire da un buffer di esperienze collezionate nel contesto del gioco sotto studio. L’algoritmo presentato sfrutta moderne tecniche di Deep Learning e può essere combinato con ogni tecnica di Reinforcement Learning che sia in grado di popolare il buffer. Contestualmente viene descritto un nuovo modo di mettere in atto il paradigma di training centralizzato ed esecuzione decentralizzata, giustificandolo con nozioni provenienti dalla Teoria dei Giochi. Tale algoritmo viene poi confrontato sperimentalmente con altre tecniche di RL.