Improving poisoning attacks against banking fraud detection systems

To counter the constantly increasing number of banking frauds, banks and financial institutions develop Data-Driven Fraud Detection Systems, which are advanced protection systems based on Machine Learning (ML) algorithms. Although automated Fraud Detection Systems have demonstrated excellent results, it has been proven that they can be deceived and corrupted through the use of Adversarial Machine Learning (AML) techniques, that aim to trick Artificial Intelligence (AI) models by providing deceptive and corrosive inputs. In particular, previous works have shown the FDSs vulnerabilities against evasion attacks, which interact with the test set of the Machine Learning model, and poisoning attacks, that manipulate the training set of the algorithm. In this work, we extend and improve the application of poisoning attacks applied to the banking fraud domain. We present a novel approach to generating fraudulent samples based on the statistical analysis of past victims' transactions and we introduce ensembling techniques to create a reliable Oracle, i.e., a Machine Learning tool which validates the adversary's frauds. According to specific metrics, we evaluate the impact of poisoning attacks on eight models, i.e., Random Forest, XGBoost, Light Gradient Boosting, CatBoost, Support Vector Machine, Artificial Neural Networks, Logistic Regression, and Active Learning. We conduct our experiments in three different scenarios, that identify the attacker's knowledge about the target FDS: White Box (perfect knowledge), Grey Box (partial knowledge), and Black Box (no knowledge). The attacker can mount poisoning attacks by following three distinct strategies: poisoning the amount, poisoning the count, i.e., the number of transactions per iteration, or poisoning both. Each strategy presents a conservative and a greedy version, and it is evaluated for both weekly and bi-weekly update policy, i.e. how often the detectors are retrained in order to include new samples. Moreover, we provide a deep analysis of the feature regeneration process, that allows the adversary to change the features of the transactions during an attack. Our experiments prove that our Oracle is extremely reliable, even in the Black Box scenario, where it is trained with just 50 features. Our Oracle allows the adversary to mount poisoning attacks without being noticed in different cases. In particular, we are able to keep the attack detection rate very low, sometimes zero, even with foreign frauds, which have a higher suspicion level. Moreover, we show that poisoning only the amount is beneficial, especially against foreign users and detectors trained according a bi-weekly update policy. On the other hand, we point out how poisoning the count is more complicated and less cautious. In conclusion, our approach allows the attacker to steal a considerable amount of money even when he or she has no knowledge about the target system.

Per contrastare il costante incremento del numero di frodi bancarie, le banche e le istituzioni finanziarie sviluppano avanzati sistemi di rilevamento delle frodi, basati sul algoritmi di Machine Learning (ML). Sebbene i sistemi automatici di rilevamento di frodi abbiano ottenuto risultati eccellenti, è stato dimostrato che possono essere raggirati e corrotti tramite tecniche di Adversarial Machine Learning (ADL), che mirano a ingannare i modelli di intelligenza artificiale fornendo particolari input corrosivi. In particolare, precedenti lavori hanno dimostrato le vulnerabilità dei sistemi di rilevamento delle frodi bancarie contro attacchi di evasione e attacchi che mirano alla corruzione del sistema (i.e., poisoning attacks). In questa tesi, estendiamo e ottimiziamo l'applicazione degli attacchi di poisoning applicati nel campo delle frodi bancarie. Presentiamo un nuovo approccio per generare transazioni fraudolente, basato su l'analisi statistica delle passate transazioni della vittima, e introduciamo tecniche di ensembling per creare un Oracolo affidabile (i.e., un sistema di Machine Learning che filtra le frodi di un attaccante). Secondo metriche specifiche, valutiamo l'impatto degli attacchi di poisoning su otto modelli, i.e., Random Forest, XGBoost, Light Gradient Boosting, CatBoost, Support Vector Machine, Artificial Neural Networks, Logistic Regression, and Active Learning. Conduciamo gli esperimenti in tre diversi scenari, che identificano la conoscenza dell'avversario riguardo al modello da attaccare: White Box (conoscenza perfetta), Grey Box (conoscenza parziale), e Black Box (conoscenza nulla). L'attaccante monta attacchi di poisoning seguendo tre distinte strategie: corruzione dell'importo, corruzione del numero di transazioni o corruzione di entrambi. Ogni strategia presenta una versione conservative e una piu' aggressiva, ed è valutata secondo le due diverse policy di aggiornamento, settimanale e bi-settimanale, che permettono ai sistemi antifrode di includere nei loro dataset nuove transazioni. Inoltre, forniamo un'analisi approfondita del processo di rigenerazione delle frodi, che permette all'avversario di cambiare gli attributi delle transazioni durante l'attacco. I nostri esperimenti dimostrano che il nostro Oracolo è estremamente affidabile, anche in uno scenario Black Box, dove l'Oracolo è addestrato con solamente 50 attributi. Il nostro Oracolo permette all'avversario di creare attacchi di poisoning senza essere notato in diversi casi. In particolare, l'attaccante è in grado di mantenere tasso di rilevamento degli attacchi molto basso, talvolta zero, anche con frodi straniere, che hanno un piu' alto livello di sospetto. Inoltre, mostriamo che corrompere solamente l'importo delle transazioni è piu' conveniente, specialmente contro vittime straniere e sistemi di rilevamento addestrati secondo una policy di aggiornamento bi-settimanale. Contrariamente, sottolineamo come corrompere il numero di transazioni per iterazione è piu' complicato e meno cauto. In conclusione, il nostro approccio permette all'attaccante di rubare un importo considerevolmente alto anche quando non ha conoscenza riguardo al sistema bancario.