Learning Nash equilibria in simulation-based games : a best arm identification approach

We tackle a fundamental problem in the field of empirical game-theoretic analysis (EGTA), that is learning equilibria of simulation-based games. In such games, an explicit analytical description of the players' utilities is not available, as they are rather specified by means of a game simulator that can be queried to obtain noisy estimates of the utilities. Since running the simulator during the game is not feasible, there is the need to design algorithms that perform a preliminary pure exploration phase to learn the (approximate) equilibria, and then use them to play the game. As a consequence, solving these games calls for algorithms that iteratively sample the players' stochastic utilities, and, then, output some strategy profiles which represent good candidates for being (approximate) equilibria. In our work we study the general case of finding Nash equilibria (NE) in n-player general-sum games. We introduce a multi-armed bandit model and propose four dynamic querying algorithms that employ best arm identification techniques. We work both in the fixed-confidence setting, guaranteeing a desired confidence level while trying to minimize the total number of queries to the simulator, and in the fixed-budget setting, guaranteeing to not exceed the maximum number of queries while trying to reach the highest level of confidence possible. In particular, we propose two algorithms to learn one or more pure NEs in the fixed-confidence setting (PNE-LUCB and LUCB-NG), one algorithm to find a single mixed NE in the fixed-confidence setting (MNE-LUCB), and, finally, one algorithm to find all the pure NEs in the fixed-budget setting (SE). Then, we evaluate all our algorithms against each other. With regard to the search for pure NEs the best performing algorithm resulted to be SE, while, among the two algorithms in the fixed-confidence setting, PNE-LUCB showed to have better performances than LUCB-NG.

Affrontiamo un problema fondamentale nel campo dell'analisi empirica della teoria dei giochi (EGTA), ovvero imparare equilibri in giochi basati su simulazioni. In questo tipo di giochi una descrizione analitica esplicita delle utilità dei giocatori non è disponibile, in quanto esse sono specificate tramite un simulatore che può essere interrogato per ottenerne stime rumorose. Poiché eseguire la simulazione durante il gioco non è possibile, è necessario progettare algoritmi che compiono una fase di pura esplorazione preliminare per imparare gli equilibri (approssimati), e poi usare questi ultimi per giocare il gioco. Di conseguenza, risolvere questi giochi richiede algoritmi che campionino iterativamente l'utilità stocastica dei giocatori e poi producano dei profili di strategie che siano buoni candidati ad essere equilibri (approssimati). Nel nostro lavoro studiamo il caso generale di trovare equilibri di Nash (NE) in giochi con n-giocatori a somma-generale. Introduciamo un modello di tipo multi-armed bandit e proponiamo quattro algoritmi di interrogazione dinamica che impiegano tecniche per l'identificazione della migliore arm. Lavoriamo sia nel setting a confidenza fissata, garantendo un livello di confidenza desiderato mentre cerchiamo di minimizzare il numero totale delle interrogazioni al simulatore, sia nel setting a budget fissato, garantendo di non superare un massimo numero di interrogazioni al simulatore mentre cerchiamo di raggiungere il più alto livello di confidenza possibile. In particolare, proponiamo due algoritmi per imparare uno o più NEs nel setting a confidenza fissata, un algoritmo per trovare un singolo NE misto nel setting a confidenza fissata e, infine, un algoritmo per trovare tutti i NEs puri nel setting a budget fissato. Poi, valutiamo tutti i nostri algoritmi uno contro l'altro. Per quanto riguarda la ricerca di NEs puri, l'algoritmo più performante è risultato essere SE, mentre, tra i due algoritmi nel setting a confidenza fissata, PNE-LUCB ha mostrato di avere prestazioni migliori di LUCB-NG.