Comparative analysis of natural language processing and gradient boosting trees approaches for fraud detection

With the increasing popularity of digital payments, fraud detection systems have become indispensable in limiting monetary losses for both customers and card-provider companies. Recognizing the significance of this issue, online payment platforms actively incorporate robust fraud detection systems into their infrastructure. This thesis presents a straightforward approach to fraud detection inspired by natural language processing (NLP) techniques. The proposed methodology begins by leveraging the Continuous Bag-of-Attributes (CBOA) neural network embedding, which projects transactional data into a hyper-dimensional space, facilitating the extraction of an extended range of features. This embedding technique empowers the system to capture contextual relationships within the data. Subsequently, the embedded data undergoes processing through a Long Short-Term Memory (LSTM) layer, enabling the model to capture temporal correlations between sequential transactions. The LSTM layer adds a dynamic element to the fraud detection system, allowing it to adapt and learn from the sequential nature of transactional data. To classify transactions as fraudulent or legitimate, the processed data passes through two dense layers. These layers serve as the final decision-making components of the model, using classification algorithms to differentiate between fraudulent and legitimate transactions. Finally, this thesis conducts a direct comparison between the NLP-based approach and Gradient Boosting Trees (GBT), which is a type of Decision Trees. This comparison carries significant importance, as Decision Trees have proven to be a highly effective technique in credit card fraud detection. The evaluation encompasses both the conventional GBT approach and an exploration of its performance when augmented with NLP embeddings. Experimental results are shown, but more advanced techniques, such as transformers and attention mechanisms, promise to surpass the capabilities of the current methodologies. These advanced techniques possess the complexity to capture intricate patterns and dependencies within transactional data, which could potentially enhance the accuracy and efficacy of fraud detection systems further. Overall, this thesis offers a valuable contribution to the field of fraud detection, presenting a straightforward comparison between a novel approach and a classic approach to the problem, while highlighting the potential for further advancements through the use of more sophisticated techniques.

Con l'aumentare della popolarità dei cosiddetti pagamenti digitali, i sistemi di rilevamento di frodi sono diventati indispensabili nel limitare le perdite monetarie sia per i clienti che per le società emittenti le carte. Conscie dell'importanza di questa problematica, le piattaforme di pagamento online incorporano sistemi sempre più innovativi nelle proprie infrastrutture. Questa tesi presenta dunque un approccio per il rilevamento di frodi bancarie ispirato alle tecniche di Natural Language Processing (NLP). L'algoritmo presentato è diviso in due fasi. In una prima fase la rete neurale Continuous Bag-of-Attributes (CBOA) proietta i dati delle transazioni in uno spazio iper-dimensionale, facilitando l'estrazione di features che catturarano le relazioni contestuali all'interno dei dati. Successivamente, i dati così trasformati vengono elaborati attraverso un layer LSTM (Long Short-Term Memory), consentendo al modello di catturare le correlazioni temporali tra transazioni sequenziali. Infine, per classificare le transazioni come fraudolente piuttosto che legittime, i dati elaborati passano attraverso due dense layers. Questi layers fungono da componenti decisionali finali del modello. In ultimo, questa tesi effettua un confronto diretto tra l'approccio basato su NLP e Gradient Boosting Trees (GBT), un tipo di Decision Trees. Questo confronto riveste un'importanza significativa, poiché i Decision Trees si sono dimostrati una tecnica altamente efficace nella rilevazione di frodi con carte di credito. La valutazione comprende sia l'approccio GBT convenzionale sia una esplorazione delle sue prestazioni quando viene potenziato con gli embedding di NLP. I risultati sperimentali vengono dunque presentati, tuttavia l'utilizzo di tecniche più avanzate, come Tranformers e metodi basati sull'"Attention mechanism", promette di sorpassare i metodi correnti. Tali tecniche possiedono la complessità necessaria per catturare dipendenze più complesse all'interno delle transazioni, potenziando ulteriormente l'efficacia dei sistemi di rilevamento di frodi. Complessivamente, questa tesi offre un prezioso contibuto nel campo del rilevamento di frodi bancarie, illustrando un confronto diretto tra un approccio innovativo ed un approccio classico al problema e mettendo in evidenza i possibili sviluppi nel campo attraverso l'uso di tecniche più avanzate.