Explainability and transparency of CNNs have become essential factors to establish trust in their predictions, detect biases, and better understand failure cases. Concept-based explainability methods provide explanations in terms of high-level human-interpretable concepts, investigating the relationship between concepts and predictions. However, the need to manually define concepts often represents a limitation of these approaches, as it requires human labour and domain knowledge. In this work, we try to fill this gap with Activation Based Concepts, a method to automatically extract coherent and distinct visual concepts from a set of test images. We introduce a novel technique to extract relevant segments from images by analysing feature map activations. Additionally, we improve on existing concept extraction methods by introducing heuristics to automatically choose the number of concepts to extract, remove noisy concepts and merge similar ones, further reducing the amount of required supervision. Extracted concepts can be used to compute concept importance scores for predictions and to generate both local and global explanations. We conduct an experiment with human subjects to demonstrate the quality of concepts extracted with our method, showing that it performs better than CRAFT, a state-of-the-art concept extraction technique.
La spiegabilità e la trasparenza delle CNN sono fattori essenziali per stabilire fiducia nelle loro predizioni, rilevare discriminazioni e comprendere meglio errori e casi di fallimento. I metodi di spiegabilità basati su concetti forniscono spiegazioni in termini di concetti interpretabili di alto livello, studiando le relazioni tra concetti e predizioni. Tuttavia, la necessità di definire manualmente i concetti può rappresentare una limitazione di questi approcci, in quanto richiede lavoro umano e conoscenza dell'ambito. Per rispondere a questa esigenza proponiamo Activation Based Concepts, un metodo per estrarre automaticamente concetti visivi distinti e coerenti da un insieme di immagini di test. Introduciamo una nuova tecnica per estrarre segmenti rilevanti dalle immagini analizzando le attivazioni delle Feature Map. Inoltre, miglioriamo metodi di estrazione di concetti esistenti introducendo delle euristiche per scegliere automaticamente il numero di concetti da estrarre, rimuovere concetti rumorosi e unire concetti simili, riducendo ulteriormente la supervisione richiesta. I concetti estratti possono essere utilizzati per calcolare score di importanza per le predizioni e generare spiegazioni sia locali che globali con Visual-TCAV. Conduciamo un esperimento con dei soggetti umani per per dimostrare la qualità dei concetti estratti con il nostro metodo, mostrando che quest'ultimo performa meglio di CRAFT, una tecnica dello stato dell'arte di estrazione di concetti.
Activation based concept extraction for post-hoc explainability of CNN models
Merengo, Sara
2023/2024
Abstract
Explainability and transparency of CNNs have become essential factors to establish trust in their predictions, detect biases, and better understand failure cases. Concept-based explainability methods provide explanations in terms of high-level human-interpretable concepts, investigating the relationship between concepts and predictions. However, the need to manually define concepts often represents a limitation of these approaches, as it requires human labour and domain knowledge. In this work, we try to fill this gap with Activation Based Concepts, a method to automatically extract coherent and distinct visual concepts from a set of test images. We introduce a novel technique to extract relevant segments from images by analysing feature map activations. Additionally, we improve on existing concept extraction methods by introducing heuristics to automatically choose the number of concepts to extract, remove noisy concepts and merge similar ones, further reducing the amount of required supervision. Extracted concepts can be used to compute concept importance scores for predictions and to generate both local and global explanations. We conduct an experiment with human subjects to demonstrate the quality of concepts extracted with our method, showing that it performs better than CRAFT, a state-of-the-art concept extraction technique.File | Dimensione | Formato | |
---|---|---|---|
2024_12_Merengo_Thesis.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
35.49 MB
Formato
Adobe PDF
|
35.49 MB | Adobe PDF | Visualizza/Apri |
2024_12_Merengo_ExecutiveSummary.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
2.37 MB
Formato
Adobe PDF
|
2.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/231329