OSSIANICnet : an open-set, free-utterance speaker identification system

This work presents the design and the implementation of OSSIANICnet, an open-set free utterance speaker identification system. This is a particular interesting topic since in the last few years there was an increasing interest with respect to the creation of voice-controlled automatic systems that can offer the users a more human-friendly interaction. In particular, an automatic speaker recognition module would allow the systems to provide fully-personalized customer services and to behave differently according to the nature of user involved, being an already known one, or a completely new one. The purpose of our work is to create a speaker identification system that is capable of efficiently discriminate between the speech audios coming from already known speakers and the audios coming from unseen ones. Moreover, our project aims to the creation of a system that, once a new/unseen speaker is identified, is able to quickly create the corresponding model, so that it can be employed in a real-time scenario. In this thesis we also reported all the experiments and the tests that have led us to the realization of the inalname system. This proposed final system, based on a Neural Network and a clustering algorithm with dynamic threshold, proved to be particularly accurate and efficient in the identification between known and unknown speakers; it is also quite robust in handling noisy and very bad recorder audios, even when the speaker's language is different from the one used to train the system.

Questo lavoro presenta il design e l'implementazione di OSSIANICnet, un sistema open-set ad espressione libera per l'identificazione del parlatore. Questo argomento è particolarmente interessante in quanto negli ultimi anni vi è stato un aumento di interesse rispetto alla creazione di sistemi automatici a controllo vocale, i quali, possono offrire agli utenti un'interazione più human-friendly. In particolare, un modulo per il riconoscimento del parlatore permetterebbe ai sistemi di offrire servizi al cliente completamente personalizzati e di comportarsi differentemente a seconda della natura dell'utente coinvolto, sia esso uno già conosciuto, o uno completamente nuovo. Lo scopo del nostro elaborato è la creazione di un sistema per l'identificazione del parlatore in grado di discriminare efficientemente tra gli audio provenienti da parlatori già incontrati e gli audio provenienti da quelli nuovi/mai visti. In più, il nostro progetto mira alla creazione di un sistema che, appena un parlatore nuovo/mai visto viene identificato, sia in grado di creare velocemente il modello corrispondente in modo da poter essere impiegato in uno scenario in tempo reale. In questa tesi abbiamo inoltre riportato tutti gli esperimenti e i test che ci hanno condotto alla realizzazione del sistema OSSIANICnet. Questo sistema finale proposto, basato su una Rete Neurale e un algoritmo di clustering con soglia dinamica, ha dimostrato di essere particolarmente accurato e efficiente nell'identificazione tra parlatori conosciuti e non; è inoltre piuttosto robusto nel trattamento di audio con rumore e registrati in pessime condizioni, anche nel caso in cui la lingua del parlatore fosse differente da quella usata per addestrare il sistema.