Addressing data scarcity for machine-learning-based failure management in microwave networks

Failure management in communication networks is a critical issue nowadays, as a single failure in the network can lead to service disruption for thousands or millions of users at the same time. Therefore, preventing failures from occurring is a crucial task for network operators to meet Service Level Agreements (SLAs) to its customers. For this purpose, collecting and analyzing data generated from the continuous monitoring of network parameters and alarms have become crucial to construct historical knowledge of network failures and drive future decisions on how to handle them. Nowadays, this analysis is carried out by domain experts who, based on their experience, identify the failures and engage proper countermeasures to mitigate them or to restore the service. The time required by humans to perform the analysis and engage the countermeasures is often not in line with the stringent time constraint to restore the service after a failure imposed by the SLAs. To overcome this problem, substantial help comes from Artificial Intelligence (AI) and Machine Learning (ML), through which it is possible to automate and speed up the whole network management process by leveraging all the data retrieved monitoring the network. In our work, we consider failure management in microwave networks, focusing on the failure-cause identification problem. Specifically, we use supervised machine learning models to address the classification of hardware failures in microwave networks. These models require large amount of labelled data to be trained, but gathering data from the field in an expensive and time-consuming process, so it is necessary to devise approaches that address the problem of having insufficient amount of data. This condition is known as “data scarcity” and the main contribution of this work is to identify and compare different ML methodologies to address this problem, such as the generation of synthetic data using Synthetic Minority Over-sampling TEchnique (SMOTE), the use of Transfer Learning, the use of Auxiliary-Task Learning, and the use of Denoising Autoencoders. Our numerical results show that, focusing the synthetic data generation on specific failure classes and deciding proper amounts of data to generate, SMOTE outperforms all other methodologies.

La gestione dei guasti nelle reti di comunicazione è una tematica critica al giorno d’oggi, poiché un singolo guasto nella rete può portare all’interruzione del servizio per migliaia o milioni di utenti nello stesso momento. Pertanto, la prevenzione dei guasti è un compito cruciale per gli operatori di rete, al fine di soddisfare i Service Level Agreement (SLA) con i propri clienti.A tal fine, la raccolta e l’analisi dei dati generati dal monitoraggio continuo dei parametri di rete e degli allarmi sono diventati fondamentali per costruire una conoscenza storica dei guasti di rete e guidare le decisioni future su come gestirli. Al giorno d’oggi, questa analisi viene effettuata da esperti del settore, che, sulla base della loro esperienza, identificano i guasti e adottano le contromisure adeguate per mitigarli o per ripristinare il servizio. Il tempo richiesto dall’uomo per eseguire l’analisi e adottare le contromisure spesso non è in linea con i tempi stringenti di ripristino del servizio dopo un guasto imposti dagli SLA. Per mitigare questo problema, un aiuto sostanziale viene dall’Intelligenza Artificiale (AI) e dal Machine Learning (ML), grazie ai quali è possibile automatizzare e velocizzare l’intero processo di gestione della rete sfruttando tutti i dati recuperati monitorando la rete. Nel nostro lavoro prendiamo in considerazione la gestione dei guasti nelle reti a microonde, concentrandoci sul problema dell’identificazione delle cause dei guasti. In particolare, utilizziamo modelli di apprendimento automatico supervisionato per affrontare la classificazione dei guasti hardware nelle reti a microonde. Questi modelli richiedono una grande quantità di dati etichettati per essere addestrati, ma la raccolta di dati dal campo è un processo costoso e che richiede tempo, quindi è necessario ideare approcci che affrontino il problema della quantità insufficiente di dati. Questa condizione è nota come “data scarcity” (scarsità di dati) e il contributo principale di questo lavoro consiste nell’identificare e confrontare diverse metodologie di ML per affrontare questo problema.i, come la generazione di dati sintetici utilizzando SMOTE (Synthetic Minority Over-sampling TEchnique), l’uso di Transfer Learning, l’uso di Auxiliary-Task Learning e l’uso di Denoising Autoencoders. I nostri risultati numerici dimostrano che, concentrando la generazione di dati sintetici su specifiche classi di guasti e decidendo le quantità di dati da generare, SMOTE supera tutte le altre metodologie.