Attention methods in remote sensing scene classification : the case of illegal landfills

Illegal landfills have become one of the most profitable businesses for criminal organizations and an increasing burden on the economy, the environment and, above all, the health of citizens. In the fight against this phenomenon, in order to identify these sites at an early stage and prevent damage, ongoing research is focusing on automating the process of illegal dumps detection. Among them, one of the more modern approaches is the use of Deep Learning models based on Convolutional Neural Networks (CNNs), that could enable mass-scale territory monitoring campaigns. In particular, using the ResNet50 architecture has already delivered good results. This thesis evaluates the effects of adding Attention mechanisms to the above mentioned network. These are techniques designed to enhance a CNN by focusing its computational resources on the most significant parts of the input data. Their application on this task of illegal dumps detection could be useful for a more precise identification of the single wastes present in the images. In particular, between these mechanisms, the Squeeze-andExcitation (SE), the Convolutional Block Attention module (CBAM) and the Efficient Channel Attention (ECA) have been implemented on top of the existing ResNet50 architecture. Several configurations of these models were tested, and a quantitative and qualitative evaluation showed ECA to be the best option, obtaining the larger improvement on the classification performance. Finally, the Class Activation Maps (CAMs), that are heatmaps designed to highlight the areas in an image that contributes the most to the classification, have been analyzed, to better understand the various models’ capacity to identify the relevant objects in an illegal landfill scene. An adhoc dataset has been created with annotations of the relevant waste objects, and a quantitative analysis has been made comparing the CAMs with these ground truths. The results proved that the CAMs could be the first step towards weakly-supervised object detection and that, consistently with the previous results, ECA is the most effective attention module.

Le discariche illegali sono diventate uno dei business più redditizi per le mafie, e un peso via via crescente per l’economia, l’ambiente e sopratutto la salute dei cittadini. Al fine di identificarne tempestivamente questi siti ed evitarne i danni, molte ricerche si sono concentrate sull’automazione del processo di rilevamento preventivo. Tra questi, uno degli approcci più moderni è l’uso di modelli di apprendimento profondo basati su Reti Neurali Convoluzionali (CNN), che potrebbero consentire campagne di monitoraggio del territorio su larga scala. In particolare, l’utilizzo dell’architettura ResNet50 ha già dato buoni risultati. Questa tesi si propone di valutare gli effetti dell’aggiunta di metodi di attenzione alla suddetta rete già adoperata per questo task. Questi ultimi consistono in tecniche per potenziare una CNN, concentrando le risorse computazionali sulle parti più significative dei dati in input. Il loro impiego nel rilevamento di discariche illegali potrebbe essere utile per un’identificazione più precisa dei singoli rifiuti presenti nelle immagini. Tra questi meccanismi, lo Squeeze-and-Excitation (SE), il Convolutional Block Attention Module (CBAM) e l’Efficient Channel Attention (ECA) sono stati implementati sull’architettura esistente di ResNet50. Di questi sono state sperimentate molte configurazioni, ed è stata eseguita una valutazione quantitativa e qualitativa che ha dimostrato come ECA sia la migliore opzione, in grado di migliorare le prestazioni di classificazione. Infine sono state analizzate le Class Activation Maps (CAMs), che sono heatmaps atte ad evidenziare le aree di un’immagine che determinano maggiormente la classificazione, al fine di comprendere meglio la capacità dei vari modelli nell’identificare gli oggetti rilevanti nelle scene contenenti discariche abusive. Un dataset ad-hoc è stato creato annotando manualmente i rifiuti presenti nelle immagini ed un’analisi quantitativa è stata fatta confrontando le CAMs con queste annotazioni. I risultati hanno mostrato che le CAMs potrebbero essere utili per effettuare rilevamento degli oggetti semisupervisionato, e che ECA è il modello di attenzione migliore.