Federated learning is a distributed paradigm for machine learning in which both data storage and training are delegated to clients, thereby enhancing privacy and enabling a decentralized learning process. While this design strengthens data confidentiality and reduces reliance on a central authority, it also expands the attack surface, as both training and data are outsourced to potentially untrusted clients. Among the most insidious threats in this context are backdoor attacks. Such attacks embed an auxiliary, attacker-chosen task within the model, which remains dormant under normal inputs but is activated by a specific trigger, causing the model to produce malicious predictions without degrading performance on the main task. A variety of server-side defenses have been proposed in the literature to counteract backdoor attacks, relying on aggregation strategies, anomaly detection, or model sanitization. However, these defenses share a common limitation: they assume the server is trustworthy. In this work, to the best of our knowledge, we introduce the first fully client-side defense that mitigates backdoor attacks while preserving main-task accuracy. Our approach leverages the observation that backdoor information tends to concentrate in specific layers of neural networks, named backdoor-critical layers in past literature. While the notion of backdoor-critical layers has often been acknowledged in the literature, no comprehensive analysis has yet been conducted. Moreover, this concept remains unexplored in the context of defenses, despite the fact that many defense strategies implicitly rely on parameters associated with backdoors in a broader sense. By selectively targeting backdoor-critical layers, our method suppresses backdoor behavior with minimal disruption to clean performance. Complementing this defense, we present the first systematic analysis of backdoor-critical layers in modern image classification architectures. Our study examines how backdoor sensitivity varies across architectures, datasets, and trigger patterns, providing new insights into the structural vulnerabilities of federated learning models.
Il Federated Learning è un paradigma distribuito per il machine learning in cui sia l’archiviazione dei dati sia l’addestramento vengono delegati ai client, aumentando così la riservatezza e consentendo un processo di apprendimento decentralizzato. Sebbene questo approccio rafforzi la confidenzialità dei dati e riduca la dipendenza da un’autorità centrale, amplia al tempo stesso la superficie d’attacco, poiché sia l’addestramento sia la gestione dei dati vengono affidati a client potenzialmente non affidabili. Tra gli attacchi più pericolosi in questo contesto vi sono gli attacchi backdoor. Tali attacchi inseriscono all’interno del modello un ulteriore obiettivo scelto dall’attaccante, che rimane inattivo sotto input normali ma si attiva in presenza di un trigger specifico, inducendo il modello a produrre predizioni malevole senza degradare le prestazioni sull'obiettivo principale. Sono state proposte varie difese lato server per contrastare gli attacchi backdoor, basate su strategie di aggregazione, rilevamento di anomalie o sanitizzazione del modello; tuttavia, il limite di queste difese sta nell'assumere l'affidabilità del server. In questo lavoro introduciamo la prima difesa, al meglio delle nostre conoscenze, lato client che mitiga gli attacchi backdoor preservando al contempo l’accuratezza sul compito principale. Il nostro approccio si fonda sull’osservazione che le informazioni legate alla backdoor tendono a concentrarsi in specifici livelli delle reti neurali, definiti nella letteratura precedente come backdoor-critical layers. Sebbene il concetto di backdoor-critical layers sia stato spesso riconosciuto, non è stata ancora condotta un’analisi sistematica. Inoltre, tale concetto rimane inesplorato nel contesto delle difese, nonostante molte strategie di difesa si basino implicitamente su parametri associati alle backdoor. Proteggendo selettivamente i backdoor-critical layers, la nostra difesa mitiga la backdoor con un impatto minimo sull'accuratezza del modello. Insieme a questa difesa, presentiamo la prima analisi sistematica dei backdoor-critical layers nelle architetture di image classification moderne. Il nostro studio esamina la distribuzione dei backdoor-critical layers tra diverse architetture, dataset e trigger, fornendo un ulteriore contributo sulle vulnerabilità del federated learning.
Federated Isolation of Layer-critical Updates (FILU): client-side defense against backdoor attacks via layer-critical update isolation
BARABINO, FRANCESCO
2024/2025
Abstract
Federated learning is a distributed paradigm for machine learning in which both data storage and training are delegated to clients, thereby enhancing privacy and enabling a decentralized learning process. While this design strengthens data confidentiality and reduces reliance on a central authority, it also expands the attack surface, as both training and data are outsourced to potentially untrusted clients. Among the most insidious threats in this context are backdoor attacks. Such attacks embed an auxiliary, attacker-chosen task within the model, which remains dormant under normal inputs but is activated by a specific trigger, causing the model to produce malicious predictions without degrading performance on the main task. A variety of server-side defenses have been proposed in the literature to counteract backdoor attacks, relying on aggregation strategies, anomaly detection, or model sanitization. However, these defenses share a common limitation: they assume the server is trustworthy. In this work, to the best of our knowledge, we introduce the first fully client-side defense that mitigates backdoor attacks while preserving main-task accuracy. Our approach leverages the observation that backdoor information tends to concentrate in specific layers of neural networks, named backdoor-critical layers in past literature. While the notion of backdoor-critical layers has often been acknowledged in the literature, no comprehensive analysis has yet been conducted. Moreover, this concept remains unexplored in the context of defenses, despite the fact that many defense strategies implicitly rely on parameters associated with backdoors in a broader sense. By selectively targeting backdoor-critical layers, our method suppresses backdoor behavior with minimal disruption to clean performance. Complementing this defense, we present the first systematic analysis of backdoor-critical layers in modern image classification architectures. Our study examines how backdoor sensitivity varies across architectures, datasets, and trigger patterns, providing new insights into the structural vulnerabilities of federated learning models.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Barabino_Tesi.pdf
accessibile in internet per tutti a partire dal 30/09/2028
Descrizione: Testo della tesi
Dimensione
1.03 MB
Formato
Adobe PDF
|
1.03 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_Barabino_Executive_Summary.pdf
accessibile in internet per tutti a partire dal 30/09/2028
Descrizione: Testo dell'executive summary
Dimensione
493.49 kB
Formato
Adobe PDF
|
493.49 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243900