Deep neural networks (DNNs) excel at extracting patterns from data but remain opaque regarding how individual samples and global model behavior jointly emerge from train ing. This thesis introduces Propagation Perturbation (ProP), a unified framework that injects controlled Gaussian noise into activation functions at inference time. ProP builds two complementary perspectives. At the sample level, the method analyzes how the model’s output probability of a specific sample changes under perturbations, quanti fying the stability of predictions and highlighting irregular or privacy-sensitive cases. At the model level, ProP identifies a model-specific output probability distribution, which is a stationary signature that exposes backdoor implants and other structural vulnerabilities without requiring optimization or access to training data. Theoretical analysis explains how ProP decomposes predictions into input-dependent and model-specific components, while extensive experiments on diverse datasets and architectures confirm its ability to connect memorization, generalization, and security. Beyond vision tasks, ProP extends to language models, supporting investigations of privacy leakage and jailbreak behaviors. Together, these results demonstrate that ProP provides a lightweight, interpretable, and broadly applicable lens for understanding and safeguarding deep learning systems.
Le reti neurali profonde (DNN) eccellono nell’estrazione di schemi dai dati, ma rimangono opache per quanto riguarda il modo in cui i singoli campioni e il comportamento glob ale del modello emergono congiuntamente dall’addestramento. Questa tesi introduce la Propagation Perturbation (ProP), un quadro unificato che inietta rumore gaussiano controllato nelle funzioni di attivazione in fase di inferenza. ProP costruisce due prospet tive complementari. A livello di campione, il metodo analizza come la distribuzione di probabilità in uscita del modello per un determinato campione cambi sotto perturbazioni, quantificando la stabilità delle predizioni e mettendo in evidenza casi irregolari o sensi bili alla privacy. A livello di modello, ProP identifica una distribuzione di probabilità specifica del modello, che costituisce una firma stazionaria capace di rivelare impianti di backdoor e altre vulnerabilità strutturali senza richiedere ottimizzazione o accesso ai dati di addestramento. L’analisi teorica spiega come ProP scomponga le predizioni in compo nenti dipendenti dall’input e componenti specifiche del modello, mentre ampi esperimenti su dataset e architetture diversi confermano la sua capacità di collegare memorizzazione, generalizzazione e sicurezza. Oltre ai compiti di visione, ProP si estende anche ai modelli linguistici, supportando indagini sulla perdita di privacy e sui comportamenti di jail break. Nel complesso, questi risultati dimostrano che ProP fornisce una lente leggera, interpretabile e ampiamente applicabile per comprendere e proteggere i sistemi di deep learning.
Understanding deep neural networks via propagation perturbation: from sample regularity to model security
REN, TAO
2024/2025
Abstract
Deep neural networks (DNNs) excel at extracting patterns from data but remain opaque regarding how individual samples and global model behavior jointly emerge from train ing. This thesis introduces Propagation Perturbation (ProP), a unified framework that injects controlled Gaussian noise into activation functions at inference time. ProP builds two complementary perspectives. At the sample level, the method analyzes how the model’s output probability of a specific sample changes under perturbations, quanti fying the stability of predictions and highlighting irregular or privacy-sensitive cases. At the model level, ProP identifies a model-specific output probability distribution, which is a stationary signature that exposes backdoor implants and other structural vulnerabilities without requiring optimization or access to training data. Theoretical analysis explains how ProP decomposes predictions into input-dependent and model-specific components, while extensive experiments on diverse datasets and architectures confirm its ability to connect memorization, generalization, and security. Beyond vision tasks, ProP extends to language models, supporting investigations of privacy leakage and jailbreak behaviors. Together, these results demonstrate that ProP provides a lightweight, interpretable, and broadly applicable lens for understanding and safeguarding deep learning systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Ren.pdf
accessibile in internet per tutti a partire dal 22/09/2026
Dimensione
12.2 MB
Formato
Adobe PDF
|
12.2 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/242806