Within the strategies for financial crime detection and mitigation, the ability to promptly identify suspicious activities within transactional flows plays a central role. This thesis addresses these challenges, considering the operational constraints specific to financial institutions. The operating context demands both the need to provide interpretable insights to support investigators and to design a system that performs effectively under realistic operational conditions. Particular attention is given to the imperative to self-adapt to evolving fraud patterns, the challenge posed by potential delays in obtaining verified labels for model supervision, and the critical need for a system whose decisions are explainable, a fundamental requirement for its operational utility. The proposed solution consists of a fraud detection system based on a multi-stage process, which orchestrates diverse detection logics, from deterministic rules to adaptive machine learning models, within a unified risk assessment framework. This system is powered by an efficient data engineering platform that enables real-time analysis of features. To bridge the gap between model prediction and operational action, two methodologies are introduced. Firstly, a mechanism for the dynamic management of the decision threshold used by the models controls the precision-recall trade-off. Secondly, an interpretability framework synthesizes the attributions of individual feature importance into high-level concepts, enhancing the utility of alerts for investigators. This thesis introduces and utilizes an experimental evaluation methodology, designed to simulate realistic conditions, particularly label delays. Within this context, various adaptive learning strategies were rigorously compared, from instance-incremental paradigms to batch-incremental approaches. The aim is to overcome the limitations of conventional fraud detection approaches, by operating in a scenario where models are subject to continuous updating, essential for ensuring that their effectiveness remains consistently aligned with the characteristics of the evolving data stream.
Nell'ambito delle strategie di mitigazione del crimine finanziario, la capacità di identificare tempestivamente attività sospette all'interno di flussi transazionali assume un ruolo centrale. Questa tesi affronta tali sfide, considerando i vincoli operativi propri delle istituzioni finanziarie. Il contesto in cui si opera impone sia la necessità di fornire indicazioni interpretabili a supporto degli investigatori, sia quella di progettare un sistema che agisca efficacemente in condizioni realistiche. Particolare attenzione è rivolta all'imperativo di adattarsi a pattern di frode in evoluzione, alla sfida posta dal potenziale ritardo nell'ottenere feedback per la supervisione dei modelli, e alla necessità di un sistema le cui decisioni siano spiegabili, requisito fondamentale per garantirne l'utilità operativa. La soluzione proposta consiste in un sistema di rilevamento delle frodi basato su un processo multi-stadio, che orchestra diverse logiche di rilevamento, dalle regole deterministiche ai modelli di machine learning adattivi, all'interno di un unico framework per la valutazione del rischio. Questo sistema è alimentato da una piattaforma di data engineering efficiente che consente l'analisi in tempo reale delle feature. Per colmare il divario tra la previsione del modello e l'azione operativa, vengono introdotte due metodologie. In primo luogo, un meccanismo per la gestione dinamica della soglia decisionale utilizzata dai modelli per gestire il trade-off tra precision e recall. Inoltre, un framework per l'interpretabilità sintetizza le attribuzioni di importanza delle singole feature in concetti di alto livello, migliorando l'utilità degli alert per gli investigatori. Questa tesi introduce e utilizza una metodologia di valutazione sperimentale, progettata per simulare condizioni realistiche, in particolare i ritardi stocastici delle etichette. In questo contesto, sono state rigorosamente confrontate diverse strategie di apprendimento adattivo, dagli approcci instance-incremental a quelli batch-incremental. L'intento è quello di superare le limitazioni degli approcci convenzionali, ponendosi in uno scenario in cui i modelli sono soggetti ad un aggiornamento continuo, essenziale per assicurarne un'efficacia costantemente allineata alla dinamica evolutiva del flusso di dati.
An adaptive machine learning framework for real-time financial fraud detection
Alessi, Gaetano
2024/2025
Abstract
Within the strategies for financial crime detection and mitigation, the ability to promptly identify suspicious activities within transactional flows plays a central role. This thesis addresses these challenges, considering the operational constraints specific to financial institutions. The operating context demands both the need to provide interpretable insights to support investigators and to design a system that performs effectively under realistic operational conditions. Particular attention is given to the imperative to self-adapt to evolving fraud patterns, the challenge posed by potential delays in obtaining verified labels for model supervision, and the critical need for a system whose decisions are explainable, a fundamental requirement for its operational utility. The proposed solution consists of a fraud detection system based on a multi-stage process, which orchestrates diverse detection logics, from deterministic rules to adaptive machine learning models, within a unified risk assessment framework. This system is powered by an efficient data engineering platform that enables real-time analysis of features. To bridge the gap between model prediction and operational action, two methodologies are introduced. Firstly, a mechanism for the dynamic management of the decision threshold used by the models controls the precision-recall trade-off. Secondly, an interpretability framework synthesizes the attributions of individual feature importance into high-level concepts, enhancing the utility of alerts for investigators. This thesis introduces and utilizes an experimental evaluation methodology, designed to simulate realistic conditions, particularly label delays. Within this context, various adaptive learning strategies were rigorously compared, from instance-incremental paradigms to batch-incremental approaches. The aim is to overcome the limitations of conventional fraud detection approaches, by operating in a scenario where models are subject to continuous updating, essential for ensuring that their effectiveness remains consistently aligned with the characteristics of the evolving data stream.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_07_Alessi_Tesi.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Testo della Tesi
Dimensione
10.77 MB
Formato
Adobe PDF
|
10.77 MB | Adobe PDF | Visualizza/Apri |
|
2025_07_Alessi_Executive_Summary.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive Summary
Dimensione
2.11 MB
Formato
Adobe PDF
|
2.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/240698