On-edge device for human tracking and identification based on RGB camera input data

Currently there are several tracking systems on the market that use ultra wideband technology (UWB) and bluetooth low energy (BLE) and are very accurate to tenths of a meter, for example Ubisense, Siemens, Eliko, Sewio etc.. The problem is in the cost of hardware that in particular for UWB technology for now remains very high. In addition there is the need to wear battery powered tags that can send the signal to the various gateways installed along the perimeter that in turn forward the various signals received to the central server for the processing of the position in real time. Within the industrial context this is not a problem but if you consider a supermarket or a bank for example, making customers wear the devices becomes impractical. The goal then becomes to select hardware powerful enough to process the frames coming from the video input in real time while maximizing fps. par Image processing and analysis is performed through special computer vision software and Deep Neural Network (DNN) libraries such as OpenPose, OpenCV and Caffé using Python as the programming language. This kind of applications as you can imagine having to do with graphics processing, estimation and calculation of body position within the frame and all this in real time, requires a considerable computational effort in terms of GPU (Graphics Processing Unit). So, having to select a compact hardware with a decent graphics performance and a not excessive cost, our choice fell on an embedded device designed by Nvidia which is the Jetson Nano board. Once the system is set up, the next step is to be able to distinguish one individual from another. This is because the software, as powerful as it is in recognizing a subject in the image and drawing it with an n-point skeleton, is not yet able to distinguish them. Therefore, the concept of digital identity is introduced. This is a unique id that is assigned to the subject based on several parameters that can be for example physiological characteristics and clothing colors. This is necessary in the moment in which a subject "A" leaves the field of visibility of the camera n.1 and appears in the field of visibility of the camera n.2 remaining "A" and not becoming "B" for example. The same is true for a single camera with multiple subjects entering and leaving the camera's field of view. The purpose of this thesis is to create an on-edge device that can identify people from the camera video stream in an indoor environment. The device will be part of a more complex system that is RTLS (Real-Time Locating System) designed to detect and track people in indoor environments where GPS signal is poor or absent. The device will need to be mounted in close proximity to the video source, hence the name 'on-edge', limiting data traffic on the indoor network and respecting the privacy of the people detected.

Attualmente ci sono diversi sistemi di tracciamento sul mercato che utilizzano la tecnologia a banda ultra larga (UWB) e bluetooth low energy (BLE) e sono molto precisi fino ai decimi di metro, ad esempio Ubisense, Siemens, Eliko, Sewio ecc. Il problema sta nel costo dell'hardware che in particolare per la tecnologia UWB per ora rimane molto alto. Inoltre c'è la necessità di indossare tag alimentati a batteria in grado di inviare il segnale ai vari gateway installati lungo il perimetro che a loro volta inoltrano i vari segnali ricevuti al server centrale per l'elaborazione della posizione in tempo reale. All'interno del contesto industriale questo non è un problema ma se si considera un supermercato o una banca ad esempio, far indossare ai clienti i dispositivi diventa impraticabile. L'obiettivo diventa quindi quello di selezionare un hardware abbastanza potente da elaborare i fotogrammi provenienti dall'ingresso video in tempo reale massimizzando gli fps. par L'elaborazione e l'analisi delle immagini viene eseguita attraverso speciali software di computer vision e librerie Deep Neural Network (DNN) come OpenPose, OpenCV e Caffé utilizzando Python come linguaggio di programmazione. Questo tipo di applicazioni come si può immaginare avendo a che fare con l'elaborazione grafica, la stima e il calcolo della posizione del corpo all'interno dell'inquadratura e tutto questo in tempo reale, richiede un notevole sforzo computazionale in termini di GPU (Graphics Processing Unit). Quindi, dovendo selezionare un hardware compatto con una performance grafica decente e un costo non eccessivo, la nostra scelta è caduta su un dispositivo embedded progettato da Nvidia che è la scheda Jetson Nano. Una volta che il sistema è impostato, il passo successivo è quello di essere in grado di distinguere un individuo dall'altro. Questo perché il software, per quanto sia potente nel riconoscere un soggetto nell'immagine e disegnarlo con uno scheletro a n punti, non è ancora in grado di distinguerli. Perciò viene introdotto il concetto di identità digitale. Si tratta di un id unico che viene assegnato al soggetto in base a diversi parametri che possono essere per esempio caratteristiche fisiologiche e colori dell'abbigliamento. Questo è necessario nel momento in cui un soggetto "A" lascia il campo di visibilità della telecamera n.1 e compare nel campo di visibilità della telecamera n.2 rimanendo "A" e non diventando "B" per esempio. Lo stesso vale per una singola telecamera con diversi soggetti che entrano ed escono dal campo visivo della telecamera. Lo scopo di questa tesi è di creare un dispositivo on-edge in grado di identificare le persone dal flusso video della telecamera in un ambiente interno. Il dispositivo sarà parte di un sistema più complesso che è l'RTLS (Real-Time Locating System) progettato per rilevare e tracciare le persone in ambienti chiusi dove il segnale GPS è scarso o assente. Il dispositivo dovrà essere montato in prossimità della sorgente video, da qui il nome 'on-edge', limitando il traffico dati sulla rete interna e rispettando la privacy delle persone rilevate.