Leveraging student-teacher learning framework for semi-supervised visual anomaly detection and segmentation

Visual anomaly detection allows to detect and segment anomalous regions inside images. This task has attracted a lot of attention from the industry, as it provides valuable solutions in many domains (e.g. defect detection in manufacturing, medical images analysis, intelligent surveillance for security systems). A wide set of Machine Learning based techniques deal with the semi-supervised version of this task, in which the model is trained on nominal (anomaly-free) samples only. In the last few years, many Deep Learning techniques started to reach SOTA results thanks to CNNs' strong feature extraction capabilities. Student-Teacher learning based methods have already proven to be very promising. Given that these kinds of approaches for visual anomaly detection are relatively new, there still is a big margin for improvement. These methods rely on the feature maps' discrepancy between a powerful Teacher and a weak Student. We propose two solutions that leverage Student-Teacher framework. In the first one, using an ensemble of Students, we define a direct uncertainty measure (Feature Vectors Variance) and include it to the anomaly score. In the second one, to increase feature maps discrepancy, we slim down Student's architecture in order to cause a bottleneck and distill from the Teacher only knowledge that is essential to the representation of nominal features. At last, we show that both solutions obtain promising results in terms of Image-level AUROC and Dice score on the MVTec AD datasets.

La visual anomaly detection permette di rilevare e segmentare regioni anomale all'interno di immagini. Questo problema ha attratto molta attenzione poiché fornisce valide soluzioni in più settori (p.es. rilevamento di difetti nel manufatturiero, analisi di immagini nel settore medicale, sorveglianza intelligente per sistemi di sicurezza). Un ampio insieme di tecniche di apprendimento automatico trattano la versione semi-supervisionata di questo problema, nella quale il modello viene allenato usando solo immagini normali (prive di anomalie). Negli ultimi anni, molte tecniche basate sul Deep Learning hanno raggiunto risultati notevoli e rappresentano ora lo stato dell'arte grazie alla grande abilità delle CNN nell'estrarre features. I metodi basati sull'apprendimento Student-Teacher hanno già dimostrato di essere molto promettenti. Dato che queste tecniche per la visual anomaly detection sono relativamente recenti, c'è ancora un grande margine di miglioramento. Questi metodi fanno affidamento sulla discrepanza tra le feature maps della Teacher e quelle della Student. Proponiamo due soluzioni che sfruttano il framework Student-Teacher. Nella prima, usando un ensemble di reti Student, definiamo una misura diretta dell'incertezza dell'ensemble (Feature Vectors Variance) e la sommiamo all'anomaly score. Nella seconda, per incrementare la discrepanza tra feature maps, snelliamo l'architettura della Student in modo da creare un "collo di bottiglia" e distillare dalla Teacher solo l'informazione essenziale alla rappresentazione di feature nominali. Infine mostriamo che entrambe le soluzioni ottengono risultati promettenti in termini di image-level AUROC e Dice score sui dataset MVTec AD.