One of the central problems in psychoacoustics is the pitch computation of an audio signal. The pitch is a property of the sound that allows the ordering on a frequency-related scale, from low to high. Many algorithms like RAPT and the Praat algorithms were created to detect the pitch, also called fundamental frequency. However, sometimes these algorithms make some mistakes, and therefore we decided to create a new pitch detection model, based on a neural network. Neural networks are mathematical models inspired by the biological neurons. They are widely used for image recognition and in the last years this technology is also used in speech recognition. Among the neural networks used in this field of research, we took into account WaveNet, a network which generates speech with a sound similar to the human voice. PiCoNet is inspired to this network. It is composed of similar functions that were modified for computing the pitch. Our model achieved very good results, beating both RAPT and Praat.
Nell’ambito della psicoacustica, una delle difficoltà nello studio di un segnale audio vocale è il riconoscimento del pitch. Per calcolare il pitch, o frequenza fondamentale, sono stati sviluppati diversi algoritmi tra i quali RAPT e i quattro algoritmi contenuti nel programma Praat. Tuttavia i metodi esistenti, in alcuni casi e per alcune configurazioni, portano a risultati errati. Per tale motivo abbiamo deciso di realizzare un nuovo metodo, basato sull’utilizzo di una rete neurale, per il calcolo del pitch. Utilizzate nell’ambito del machine learning, le reti neurali sono modelli matematici che si ispirano al funzionamento dei neuroni cerebrali. Ampiamente utilizzate per il riconoscimento di immagini, questa tecnologia ha visto una forte crescita anche nell’ambito del riconoscimento vocale. Tra le reti neurali esistenti, tenendo conto degli obiettivi del nostro lavoro di ricerca, abbiamo deciso di prendere in considerazione WaveNet, una rete neurale in grado di creare audio vocali con una voce molto simile a quella umana. A questa si è ispirata la nostra rete, PiCoNet, che utilizza alcune funzioni di WaveNet adattate per l’estrazione del pitch. La nostra rete ci ha fornito ottimi risultati, riducendo gli errori commessi da RAPT e Praat.
PICONET : a neural network for computing the pitch of human voice
POZZI, MATTEO
2016/2017
Abstract
One of the central problems in psychoacoustics is the pitch computation of an audio signal. The pitch is a property of the sound that allows the ordering on a frequency-related scale, from low to high. Many algorithms like RAPT and the Praat algorithms were created to detect the pitch, also called fundamental frequency. However, sometimes these algorithms make some mistakes, and therefore we decided to create a new pitch detection model, based on a neural network. Neural networks are mathematical models inspired by the biological neurons. They are widely used for image recognition and in the last years this technology is also used in speech recognition. Among the neural networks used in this field of research, we took into account WaveNet, a network which generates speech with a sound similar to the human voice. PiCoNet is inspired to this network. It is composed of similar functions that were modified for computing the pitch. Our model achieved very good results, beating both RAPT and Praat.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi.pdf
non accessibile
Descrizione: Testo della tesi
Dimensione
6.42 MB
Formato
Adobe PDF
|
6.42 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/137542