This thesis discusses the use of the sequence form to implement multi-agent learning algorithms for extensive-form games. Learning in a multi-agent system is a challenging task and the literature has focused mainly on simple normal-form models. However the transformation of an extensive-form game to the normal form, though possible in theory, produces an exponential increase in the size of the representation, making the use of most state-of-theart algorithms impractical. Inspired by the recent work on Q-learning, we propose a design methodology to translate normal-form algorithms to the sequence-form, taking advantage of its linear size w.r.t. the corresponding extensive form. We apply then our design to Cross' learning, Generalized in- nitesimal Gradient Ascent (GIGA) and its win-or-learn-fast variant GIGAWoLF. The dynamics of the proposed algorithms in the continuous-time model shows a great similarity but no exact equivalence with their normalform counterpart. In addition to the mathematical analysis we present an empirical evaluation of the relative performance of our algorithms against Counterfactual Regret Minimization and Q-learning over a large number of sequence-form games.
Questa tesi studia l'uso della forma sequenza nell'implementazione di algoritmi di apprendimento multiagente per giochi in forma estesa. L'apprendimento in un sistema multiagente e un problema complesso e la letteratura si e concentrata soprattutto nello studio di modelli in forma normale. Tuttavia la trasformazione dalla forma estesa alla forma normale, nonostante sia possibile in teoria, produce una crescita esponenziale della dimensione del gioco, rendendo impraticabile l'uso di buona parte degli algoritmi allo stato dell'arte. Ispirati del recente lavoro sul Q-learning, abbiamo de nito una metodologia per tradurre algoritmi dalla forma normale alla forma sequenza e sfruttare il fatto che essa ha una dimensione lineare rispetto alla corrispondente forma estesa. Abbiamo applicato quindi tale metodologia al Cross' learning, al Generalized in nitesimal Gradient Ascent (GIGA) e alla sua variante GIGA-WoLF. Le dinamiche degli algoritmi studiati a tempo continuo mostrano una grande somiglianza ma non l'equivalenza con quelle della corrispondente forma normale. Oltre a questa analisi matematica, presentiamo una valutazione empirica delle prestazioni degli algoritmi studiati, in relazione con il Counterfactual Regret Minimization e il Q-learning, e su un grande numero di giochi in forma sequenza.
Sequence form based multi-agent learning algorithms for extensive form games
MANINO, EDOARDO
2013/2014
Abstract
This thesis discusses the use of the sequence form to implement multi-agent learning algorithms for extensive-form games. Learning in a multi-agent system is a challenging task and the literature has focused mainly on simple normal-form models. However the transformation of an extensive-form game to the normal form, though possible in theory, produces an exponential increase in the size of the representation, making the use of most state-of-theart algorithms impractical. Inspired by the recent work on Q-learning, we propose a design methodology to translate normal-form algorithms to the sequence-form, taking advantage of its linear size w.r.t. the corresponding extensive form. We apply then our design to Cross' learning, Generalized in- nitesimal Gradient Ascent (GIGA) and its win-or-learn-fast variant GIGAWoLF. The dynamics of the proposed algorithms in the continuous-time model shows a great similarity but no exact equivalence with their normalform counterpart. In addition to the mathematical analysis we present an empirical evaluation of the relative performance of our algorithms against Counterfactual Regret Minimization and Q-learning over a large number of sequence-form games.File | Dimensione | Formato | |
---|---|---|---|
2015_04_Manino.pdf
non accessibile
Descrizione: Testo della tesi
Dimensione
649.79 kB
Formato
Adobe PDF
|
649.79 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/103504