Network Music Performance (NMP) is changing the traditional concept of the interaction between musicians, allowing them to be in different locations and still perform together thanks to an internet connection. However, playing music in an ensemble over the internet requires unnoticeable delays and real-time performances, which present a significant challenge for the telecommunications infrastructure in terms of latency, jitter, and connection quality. Poor network quality could lead to the loss of packets in the transmitted audio stream and, if not recovered, provokes glitches in playback at the receiver side. Packet Loss Concealment (PLC) techniques address this issue with coding techniques at the sender side and substitution or recovering methods at the receiver side. The latter family of PLC techniques ranges from classic signal processing methods such as linear predictive coding to modern deep learning (DL) approaches. Over the past few years, neural network-based PLC methods have been increasingly proposed in the literature. However, they are rarely compared to well-established signal processing methods in terms of performance. In this manuscript, we compare a simple autoregressive (AR) model with two deep neural networks, a fully connected neural network (FCNN) and a long short-term memory (LSTM) recurrent neural network (RNN). All three models are subject to various experiments in which different parameters, such as the packet length and the input size, are modified. The results obtained in our experiments indicate that AR models are a suitable option to use when the size of the packets is small, but they also show how DL methods get better results as the packet length increases. Between the two DL models, FCNNs outperform LSTM networks, suggesting that proper hyperparameter tuning is critical to achieving a competitive performance with RNN-based PLC methods.
Le tecnologie di Network Music Performance (NMP) stanno cambiando le modalità di interazione tra i musicisti, permettendo loro di essere in luoghi diversi e di esibirsi insieme grazie a una connessione internet. Tuttavia, una performance di musica d'insieme richiede ritardi impercettibili e trasmissione dell'audio in tempo reale. Ciò costituisce un'importante sfida per l'infrastruttura di telecomunicazioni in termini di latenza, jitter e qualità della connessione. Una scarsa qualità della rete potrebbe infatti portare alla perdita di pacchetti nel flusso audio trasmesso che, se non recuperati, provocano artefatti nella riproduzione dell'audio dal lato del ricevitore. Le tecniche di Packet Loss Concealment (PLC) affrontano questo problema con tecniche di codifica sul lato mittente e metodi di sostituzione o recupero sul lato ricevitore. Quest'ultima famiglia di tecniche PLC va dai classici metodi di elaborazione del segnale come la codifica predittiva lineare a moderni approcci di deep learning (DL). Negli ultimi anni, svariati metodi PLC basati sulle reti neurali sono stati proposti in letteratura. Tuttavia, essi sono raramente confrontati in termini di prestazioni a tecniche di elaborazione del segnale ben consolidate. In questo manoscritto, confrontiamo un semplice modello autoregressivo (AR) con due reti neurali profonde, una fully-connected neural network (FCNN) ed una long short-term memory (LSTM) recurrent neural network (RNN). Tutti e tre i modelli sono soggetti a vari esperimenti in cui vengono modificati diversi parametri, come la lunghezza del pacchetto e la dimensione dell'input. I risultati ottenuti nei nostri esperimenti indicano che i modelli AR sono un'opzione adatta da utilizzare quando la dimensione dei pacchetti è piccola, ma mostrano anche come i metodi DL ottengono risultati migliori quando la lunghezza dei pacchetti aumenta. Tra i due modelli DL, le FCNN superano le reti LSTM, indicando come una corretta selezione degli iperparametri sia fondamentale per ottenere una performance competitiva con i metodi PLC basati su RNN.
Comparison of autoregressive models and artificial neural networks for Packet Loss Concealment in Networked Music Performance applications
IGLESIAS del CAMPO, MANUEL
2020/2021
Abstract
Network Music Performance (NMP) is changing the traditional concept of the interaction between musicians, allowing them to be in different locations and still perform together thanks to an internet connection. However, playing music in an ensemble over the internet requires unnoticeable delays and real-time performances, which present a significant challenge for the telecommunications infrastructure in terms of latency, jitter, and connection quality. Poor network quality could lead to the loss of packets in the transmitted audio stream and, if not recovered, provokes glitches in playback at the receiver side. Packet Loss Concealment (PLC) techniques address this issue with coding techniques at the sender side and substitution or recovering methods at the receiver side. The latter family of PLC techniques ranges from classic signal processing methods such as linear predictive coding to modern deep learning (DL) approaches. Over the past few years, neural network-based PLC methods have been increasingly proposed in the literature. However, they are rarely compared to well-established signal processing methods in terms of performance. In this manuscript, we compare a simple autoregressive (AR) model with two deep neural networks, a fully connected neural network (FCNN) and a long short-term memory (LSTM) recurrent neural network (RNN). All three models are subject to various experiments in which different parameters, such as the packet length and the input size, are modified. The results obtained in our experiments indicate that AR models are a suitable option to use when the size of the packets is small, but they also show how DL methods get better results as the packet length increases. Between the two DL models, FCNNs outperform LSTM networks, suggesting that proper hyperparameter tuning is critical to achieving a competitive performance with RNN-based PLC methods.File | Dimensione | Formato | |
---|---|---|---|
ThesisManuelIglesiasdelCampo.pdf
accessibile in internet solo dagli utenti autorizzati
Dimensione
2.05 MB
Formato
Adobe PDF
|
2.05 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/189007