An ensemble approach for banking fraud detection

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

In the last years, banks and financial institutions have moved their business services online, allowing customers to perform transactions through their websites and mobile applications. This leads to an increase in frauds, resulting in the loss of large amounts of money every year. Furthermore, since fraudsters seek increasingly new and sophisticated ways to remain undetected, there is a continuous effort made by both the academia and the industry to contrast this threat. Recently, machine learning is becoming more and more popular in this domain. However, these types of approaches are used in a black-box fashion, failing to be interpretable. In this thesis, we propose a novel supervised approach based on an ensemble of three models: Random Forests, XGBoost, and Long Short Term Memories. Additionally, we validate it using LIME, which is a framework to explain machine learning models. Our evaluation on real-world data, shows that our approach achieves better performances than the state of the art, reaching a detection rate up to 98.13%. Moreover, we show that the ensemble can detect different types of frauds thanks to the different models employed. Finally, we study the resistance of our approach against evasive attacks, in case the fraudster trains a surrogate classifier of the one used by the bank to perform the attack.

Negli ultimi anni, le banche e le istituzioni finanziarie hanno reso disponibili i loro servizi online, consentendo ai clienti di eseguire transazioni attraverso i loro siti Web e applicazioni mobili. Una naturale conseguenza è l'aumento delle frodi, con conseguente perdita di ingenti somme di denaro ogni anno. Inoltre, dal momento che i truffatori cercano sempre nuovi e sofisticati modi per rimanere inosservati, sia il mondo accademico che quello industriale cercano continuamente di contrastare tale minaccia. Di recente, l'apprendimento automatico è diventato sempre più popolare in questo settore. Tuttavia, data la complessità di tali approcci, essi vengono utilizzati in modo black-box. In questa tesi, proponiamo un nuovo approccio supervisionato basato su un ensemble di tre modelli: Random Forests, XGBoost e Long Short Term Memories. Inoltre, integriamo nel nostro approccio LIME, un framework per spiegare i modelli di apprendimento automatico. Valutando il nostro approccio su un dataset fornitoci da una banca italiana, i risultati mostrano che il nostro approccio ottiene prestazioni migliori rispetto allo stato dell'arte, raggiungendo un tasso di rilevamento delle frodi del 98,13%. Inoltre, dimostriamo che l'ensemble può rilevare diversi tipi di frode grazie ai diversi modelli impiegati. Infine, studiamo la resistenza del nostro approccio contro gli attacchi evasivi, nel caso in cui il truffatore utilizzi un classificatore surrogato di quello utilizzato dalla banca per eseguire frodi.

An ensemble approach for banking fraud detection

PAPALE, MICHELE

2019/2020

Abstract

In the last years, banks and financial institutions have moved their business services online, allowing customers to perform transactions through their websites and mobile applications. This leads to an increase in frauds, resulting in the loss of large amounts of money every year. Furthermore, since fraudsters seek increasingly new and sophisticated ways to remain undetected, there is a continuous effort made by both the academia and the industry to contrast this threat. Recently, machine learning is becoming more and more popular in this domain. However, these types of approaches are used in a black-box fashion, failing to be interpretable. In this thesis, we propose a novel supervised approach based on an ensemble of three models: Random Forests, XGBoost, and Long Short Term Memories. Additionally, we validate it using LIME, which is a framework to explain machine learning models. Our evaluation on real-world data, shows that our approach achieves better performances than the state of the art, reaching a detection rate up to 98.13%. Moreover, we show that the ensemble can detect different types of frauds thanks to the different models employed. Finally, we study the resistance of our approach against evasive attacks, in case the fraudster trains a surrogate classifier of the one used by the bank to perform the attack.

Scheda breve

Scheda completa

	Relatore
	
				ZANERO, STEFANO
			
	Correlatore/i
	
				CARMINATI, MICHELE
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				6-giu-2020
			
	Anno accademico
	
				2019/2020
			
	Abstract in italiano
	
				Negli ultimi anni, le banche e le istituzioni finanziarie hanno reso disponibili i loro servizi online, consentendo ai clienti di eseguire transazioni attraverso i loro siti Web e applicazioni mobili. Una naturale conseguenza è l'aumento delle frodi, con conseguente perdita di ingenti somme di denaro ogni anno. Inoltre, dal momento che i truffatori cercano sempre nuovi e sofisticati modi per rimanere inosservati, sia il mondo accademico che quello industriale cercano continuamente di contrastare tale minaccia. Di recente, l'apprendimento automatico è diventato sempre più popolare in questo settore. Tuttavia, data la complessità di tali approcci, essi vengono utilizzati in modo black-box.
In questa tesi, proponiamo un nuovo approccio supervisionato basato su un ensemble di tre modelli: Random Forests, XGBoost e Long Short Term Memories. Inoltre, integriamo nel nostro approccio LIME, un framework per spiegare i modelli di apprendimento automatico. Valutando il nostro approccio su un dataset fornitoci da una banca italiana, i risultati mostrano che il nostro approccio ottiene prestazioni migliori rispetto allo stato dell'arte, raggiungendo un tasso di rilevamento delle frodi del 98,13%. Inoltre, dimostriamo che l'ensemble può rilevare diversi tipi di frode grazie ai diversi modelli impiegati. Infine, studiamo la resistenza del nostro approccio contro gli attacchi evasivi, nel caso in cui il truffatore utilizzi un classificatore surrogato di quello utilizzato dalla banca per eseguire frodi.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2020_06_Papale.pdf non accessibile Descrizione: Testo della tesi Dimensione 2.22 MB Formato Adobe PDF Visualizza/Apri	2.22 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/154201