Learning equilibria in games : a linear best arm identification approach

We deal with a problem in the field of empirical game theoretic analysis. Our problem is to find a max min strategy for the first player in a simulation based game. In simulation based game there is not an explicit description of the utilities of the players, so we need to use a game simulator that can be queried to obtain noisy estimates of the utilities. In particular in BAI setting (best arm identification) the main goal is to find the best arm at the end of the exploration phase. In our case we are working in a linear environment, this means that all the arms available to the players are connected in some way and pulling one of them gives some information about other arms other than pulled arms. In our work we built three algorithms to find the best arm in a two player game. Our setting is modelled as a bi linear setting, making re-use some information about a pull of an arm to make a better estimation of other arms. In particular we built two algo- rithms Super-arm-g and Bilinear-g that pulls each time the best two arms according to the estimation of the rewards and a third one called GAM-g that it is exploiting more the property of a linear setting, pulling sub optimal arms to explore in a better way the environment. Super-arm-g tries to use the information of a pull to make a better esti- mation of the reward of all arms, but in order to do so it needs to transform the space of the setting at the beginning of the algorithm. This operation make the environment bigger, also to perform this transformation it needs more information than the other two algorithms. Other two uses the information coming from a pull only on a sub set of arms.

In questa tesi trattiamo un problema nel campo dell’analisi teorica di giochi. Il nostro problema riguarda trovare la strategia max min ottima in un gioco basato su una simu- lazione. Un gioco basato su una simulazione è un gioco in cui non abbiamo disponibile una esplicita descrizione dell’utilità dei giocatori, ma abbiamo bisogno di usare un sim- ulatore che può essere interpellato per restituire un stima con del rumore dell’utilità. In particolare nello scenario di BAI (identificaione miglior braccio) l’obbiettivo è trovare il braccio migliore per il primo giocatore alla fine della così detta fase di esplorazione. Nel nostro caso specifico ci troviamo in uno scenario di ambiente linear, questo significa che tutte le braccia disponibili ai giocatori sono connesse in qualche modo e tirare una di esse da alcune informazioni riguardo la ricompensa delle altre braccia oltre alla ricompensa del braccio tirato. Nel nostro lavoro abbiamo costruito tre algoritmi per trovare il braccio migliore da giocare in un gioco con solo due giocatori. Il nostro scenario è modellato con una relazione bilin- eare tra la ricompensa, le braccia e l’ambiente, rendendo possibile usare le informazioni riguardo il tirare un braccio per fare una stima migliore delle altre braccia. In particolare abbiamo costruito due algoritmi Super-arm-g e Bilinear-g che tirano ad ogni iterazione dell’algoritmo il biglior braccio ed il secondo migliore, secondo le loro stime della ricom- pensa associata ad ogni braccio e un terzo algoritmi GAM-g che sfrutta maggiormente le proprietà di uno scenario lineare, tirando braccia sub ottime per esplorare meglio la conformazione dell’ambiente. Super-arm-g cerca di usare le informazioni che provengono dal tirare un braccio per avere una migliore stima del premio di tutte le braccia, ma per poter fare ciò necessita di trasformare lo spazio dello scenario all’inizio dell’iterazioni dell’algoritmo. Questa operazione ingrandisce lo spazio matematico dello scenario, ed in- oltre per poter eseguire questa trasformazione necessita di piu informazioni degli altri due algoritmi. Gli altri due usano le informazioni ricevute dal tirare un braccio per migliorare le stime soltanto di una parte delle altre braccia.