Gambling addiction, a machine learning approach

In recent years, given the rapid evolution of new technologies in electronics and telephony, gambling addiction has registered a significant increase in the affected subjects, especially among the youngest. The ability to play remotely, not being under prying eyes, the lowering of the perception of losing money using credit cards, have made the breeding ground for gambling addiction. The purpose of this thesis is to analyze the gambling behavior of gamblers trying to identify patho- logical ones early on. In a first phase, various machine learning models were evaluated, the one that had the best performances was Random Forest, with an AUC of about 0.856. In the next phase we enriched the model with the features Duration and weekEnd, reaching an AUC of 0.860. Subsequently, we have obtained three new features, A, T, D derived from our formulation similar to the player’s ruin theorem, these features, inserted in the model, have not increased the performances but have allowed to keep them identical, reducing the number of necessary features, going from 83 to 26. The last experiment dealt with analyzing how the Random Forest scores vary with the variation of the amount of data, and with the variation of the length of the data window. The results demonstrated that it is possible to use machine learning techniques to identify subjects at risk, from the earliest stages of the disease, using a minimum amount of data.

Negli ultimi anni, data la rapida evoluzione di nuove tecnologie in ambito elettronico e nella telefonia, la dipendenza da gioco d’azzardo ha registrato una notevole crescita dei soggetti colpiti, soprattutto fra i piú giovani. La possibilitá di giocare da remoto, il non essere sotto occhi indiscreti, l’abbassamento della percezione della perdita di denaro utilizzando le carte di credito, hanno reso il terreno fertile per la dipendenza da gioco. Lo scopo di questa tesi é quello di analizzare il comportamento di gioco dei giocatori d’azzardo cercando di individuare in modo precoce quelli patologici. In una prima fase, sono stati valutati vari modelli di machine learning, quello che ha avuto le performance migliori é stato Random Forest, con un AUC di circa 0.856. Nella fase succesiva abbiamo arricchito il modello con le features Duration e weekEnd, raggiundendo un AUC di 0.860. Succesivamente, abbiamo definito tre nuove variabili, A, T, D ricavate da una nostra formulazione simile al teorema della rovina del giocatore, tali features, inserite nel modello, non hanno aumentato le performance ma hanno permesso di mantenerle identiche, riducendo il numero di features necessarie, pas- sando da 83 a 26. L’ultimo esperimento si é occupato di analizzare l’andamento dei punteggi del Random Forest, al variare della finestra temporale considerata. I risultati hanno dimostrato come sia possibile l’utilizzo di tecniche di machine learning, per identificare soggetti a rischio, fin dalle prime fasi della patologia, utilizzando una minima quantitá di dati.