Counting object in pictures is a computer vision task that has been explored in the past years, achieving state-of-the-art results thanks to the rise of convolutional neural networks. Most of the work focused on speci c and limited domains to predict the number of just one category in the likes of people, cars, cells, and animals. Little e ort has been employed to investigate methods to count the instances of different classes at the same time. This thesis work explored the di erent approaches present in the literature to understand their strenghts and weaknesses and eventually improve the accuracy and reduce the inference time of models aimed to estimate the number of multiple elements. At rst, new techniques have been applied on top of the previously proposed algorithms to lower the prediction error. Secondly, the possibility to adapt an object detector to the counting task avoiding the localization prediction has been investigated. As a result, a new model called Single-Shot Multiscale Counter has been proposed, based on the architecture of the Single-Shot Multibox Detector. It achieved a lower prediction error on the ground truth count by 11% (from an mRMSE of 0.42 to 0.35) and an inference time 16x to 20x faster compared to the models found in the literature (from 1.25s to 0.049s).
Il conteggio di oggetti nelle immagini e una procedura di computer vision che e stata investigata negli anni passati, ottenendo risultati promettenti grazie all'aumento delle reti neurali convoluzionali. La maggior parte del lavoro si e concentrato su domini speci ci e limitati per prevedere il numero di una sola categoria di ogetti come persone, automobili, cellule o animali. Sono stati fatti pochi sforzi per creare metodi per contare contemporaneamente ogetti di classi diverse. Questo lavoro di tesi ha esplorato i diversi approcci presenti nella letteratura per comprenderne i punti di forza e di debolezza e in ne migliorare l'accuratezza e ridurre il tempo di predizione dei modelli volti a stimare il numero di elementi di multiple classi. Inizialmente sono state applicate modi che agli algoritmi proposti in precedenza per ridurre l'errore di predizione. In secondo luogo, e stata studiata la possibilit a di adattare un object detector al compito di conteggio evitando la predizione della posizione di un oggetto nell'immagine. Di conseguenza, e stato proposto un nuovo modello chiamato Single-Shot Multiscale Counter, basato sull'architettura del Single-Shot Multibox Detector. Ha ottenuto un prediction error inferiore sul conteggio della ground truth dell'11% (da un mRMSE da 0,42 a 0,35) e un tempo di predizione da 16 a 20 volte pi u veloce rispetto ai modelli trovati nella letteratura (da 1,25 a 0,049 secondi).
SSC : single-shot multiscale counter. Counting generic objects in images
VAVASSORI, LUCA
2018/2019
Abstract
Counting object in pictures is a computer vision task that has been explored in the past years, achieving state-of-the-art results thanks to the rise of convolutional neural networks. Most of the work focused on speci c and limited domains to predict the number of just one category in the likes of people, cars, cells, and animals. Little e ort has been employed to investigate methods to count the instances of different classes at the same time. This thesis work explored the di erent approaches present in the literature to understand their strenghts and weaknesses and eventually improve the accuracy and reduce the inference time of models aimed to estimate the number of multiple elements. At rst, new techniques have been applied on top of the previously proposed algorithms to lower the prediction error. Secondly, the possibility to adapt an object detector to the counting task avoiding the localization prediction has been investigated. As a result, a new model called Single-Shot Multiscale Counter has been proposed, based on the architecture of the Single-Shot Multibox Detector. It achieved a lower prediction error on the ground truth count by 11% (from an mRMSE of 0.42 to 0.35) and an inference time 16x to 20x faster compared to the models found in the literature (from 1.25s to 0.049s).File | Dimensione | Formato | |
---|---|---|---|
Luca-Vavassori-Final.pdf
accessibile in internet per tutti
Descrizione: Thesis report
Dimensione
4.62 MB
Formato
Adobe PDF
|
4.62 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/149916