Internet banking dataset generator for fraud detection benchmarking

The greater popularity gained by online banking services in recent years, has brought with it an increase of frauds generated by cyber attacks (e.g., malware, phishing or trojans). These attacks have the aim of stealing the most amount of money as possible. To remain undetected, fraudsters always look for new methods to perpetrate crimes. For these reasons, research in fraud detection is constantly improving. However, the share of real transactional data among community is strongly limited due to privacy and security reasons related to bank context. Furthermore, there are few tools that allow to generate synthetic data. In this thesis we present BankDataGen, a system for generating synthetic Internet banking transactions that, through the use of data mining techniques, identifies which are the most important features of an authentic dataset and reproduces them. Starting from a real dataset made available by an Italian banking group, we extract user's profiles. Thanks to these profiles, we perform a clustering based on the principal components, that allows us to divide users depending on the type of spending pattern and to extract their most relevant attributes. Finally, we apply distribution fitting techniques on the selected attributes. It is important to notice that, the creation of the synthetic dataset is not limited only to the period of the real data given in input, but it is possible to determine the past and the future trend of the transactions distribution through the use of forecasting methods. The final output of BankDataGen is represented by a synthetic dataset that reflects the characteristics of the real one. In addition, we give the possibility to insert fraudulent transactions generated on the basis of typical attacks performed against online banking users. In conclusion, we implement a web application in order to provide a tool for generating synthetic dataset, whose characteristics are borrowed from a real dataset. We perform comparative tests between the reference data and the data generated to assess the quality of the results. We obtain good achievements with a generally high degree of similarity between original and synthetic data.

Internet banking dataset generator for fraud detection benchmarking

MARIANI, EMANUELE

2014/2015

Abstract

The greater popularity gained by online banking services in recent years, has brought with it an increase of frauds generated by cyber attacks (e.g., malware, phishing or trojans). These attacks have the aim of stealing the most amount of money as possible. To remain undetected, fraudsters always look for new methods to perpetrate crimes. For these reasons, research in fraud detection is constantly improving. However, the share of real transactional data among community is strongly limited due to privacy and security reasons related to bank context. Furthermore, there are few tools that allow to generate synthetic data. In this thesis we present BankDataGen, a system for generating synthetic Internet banking transactions that, through the use of data mining techniques, identifies which are the most important features of an authentic dataset and reproduces them. Starting from a real dataset made available by an Italian banking group, we extract user's profiles. Thanks to these profiles, we perform a clustering based on the principal components, that allows us to divide users depending on the type of spending pattern and to extract their most relevant attributes. Finally, we apply distribution fitting techniques on the selected attributes. It is important to notice that, the creation of the synthetic dataset is not limited only to the period of the real data given in input, but it is possible to determine the past and the future trend of the transactions distribution through the use of forecasting methods. The final output of BankDataGen is represented by a synthetic dataset that reflects the characteristics of the real one. In addition, we give the possibility to insert fraudulent transactions generated on the basis of typical attacks performed against online banking users. In conclusion, we implement a web application in order to provide a tool for generating synthetic dataset, whose characteristics are borrowed from a real dataset. We perform comparative tests between the reference data and the data generated to assess the quality of the results. We obtain good achievements with a generally high degree of similarity between original and synthetic data.

Scheda breve

Scheda completa

	Relatore
	
				ZANERO, STEFANO
			
	Correlatore/i
	
				CARMINATI, MICHELE
MAGGI, FEDERICO
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				28-apr-2016
			
	Anno accademico
	
				2014/2015
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2016_04_Mariani.pdf accessibile in internet solo dagli utenti autorizzati Descrizione: Thesis - final version Dimensione 4.2 MB Formato Adobe PDF Visualizza/Apri	4.2 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/119228