Clustering-based opponent modelling for opponent exploitation in sequential games under utility constraints

In recent years the incredible results of automatic decision-making techniques are due to artificial intelligence that has led to algorithms capable of defeating the best players in the world in their relative games. These techniques usually do not take into account the behavior of a specific player but they identify the best way to play against a generic player. In this thesis, we will instead study algorithms that try to model the player and then identify a strategy capable of exploiting the information obtained. In particular, we will analyze a recent study in which an algorithm is developed to ensure that the player’s expected utility is within certain constraints during the entire game. This algorithm offers interesting theoretical results but presents some practical problems. To solve these problems we will define a clustering algorithm capable of using the information obtained from games with previous players to improve the performance in the early stages of execution. We will also modify the opponent modeling technique used in the standard algorithm with one that allows us to have a narrower confidence region with an equal number of collected observations. Finally, we will make an empirical evaluation of the improvements obtained by these techniques through the use of a standard benchmark of sequential games: the Leduc Poker. In this phase, we will also use real data collected using a web app.

Negli ultimi anni gli incredibili risultati delle tecniche decisionali automatiche sono dovute all’intelligenza artificiale che ha portato ad algoritmi capaci di sconfiggere i migliori giocatori del mondo. Queste tecniche solitamente non tengono in considerazione il comportamento del giocatore ma individuano il miglior modo di giocare contro un qualsiasi giocatore. In questa tesi andremo invece a studiare algoritmi che cercano di modellare il giocatore per poi individuare una strategia capace di exploitare le informazioni ottenute. In particolare andremo ad analizzare un recente studio nel quale viene sviluppato un algoritmo capace di assicurare che il guadagno del giocatore sia all’interno determinati vincoli durante l’intera sua fase di gioco. Questo algoritmo offre interessanti risultati teorici ma presenta alcune problematiche pratiche nel suo utilizzo. Per risolvere queste problematiche andremo a definire un clustering algorithm capace di utilizzare le informazioni ottenute dalle partite con giocatori precedenti per migliorare le prestazioni nelle prime fasi di esecuzione. Modificheremo inoltre la tecnica di opponent modelling utilizzata dall’algoritmo standard con una che ci permette di avere una confidence region più stretta a parità di osservazioni raccolte. Andremo infine ad effettueremo una valutazione empirica dei miglioramenti ottenuti da queste tecniche di opponent modelling attraverso l’utilizzo di un benchmark standard dei giochi sequenziali: il Leduc Poker. In questa fase utilizzeremo anche dati di partite reali raccolti grazie ad una web app da noi realizzata.