End-to-end delay prediction based on traffic matrix sampling

The importance of having high performing computer networks in our society is a hot topic in the scientific community. Most of the companies would benefit from optimizations of the traffic inside networks, and that’s where the Software Defined Networking (SDN) paradigm, a centralized controller platform, comes into play. This technique can be easily exploited to perform such an optimization; an example is the online update of the routing of a computer network depending on the current traffic of packets flowing in the network. A step ahead to this solution would be to combine the controller of the SDN with mathematical models that simulate the behaviour of the network, in order to obtain important network characteristics that would require real-time simulations. Such mathematical models could be derived also with Machine Learning models; the literature addresses a SDN combined with Machine Learning models with the term Knowledge Defined Networking (KDN). An example of application of such a technique would be a tool that simulates the behaviour of anetwork, providing in output the Quality of Service (QoS) metrics (i.e., characteristics that provide an understanding of the delivery service at network level) related to the current traffic condition. This would tremendously ease the process of predicting the quality of a routing inside a network, but many other use-cases exist, making such a tool extremely useful. In this thesis we focus on the problem of predicting QoS metrics in the networking field; in particular we focus on the problem of estimating end-to-end delay, by using traffic matrix samples. To this aim, we study different models based on machine learning as a promising tool to characterize performance in complex computer networks. More specifically, we first provide a simulation platform, GEns-3, based on ns-3 network simulator, in which each Origin-Destination (OD) flow is a mixture of UDP and TCP traffic and we generate useful data for our study. We present three datasets over which we gradually vary the network characteristics: incoming traffic intensity,link capacities, and propagation delays. The datasets are leveraged to train machine learning models, namely Neural Networks and Random Forests, to predict end-to-end delay starting from the knowledge of OD traffic matrix samples. The robustness of these models is evaluated in different test scenarios; we show that end-to-end delay can be predicted based on traffic matrix samples and we evaluate the robustness of these models against missing inputs (e.g., knowledge of link capacities) and perturbations of the environment (e.g., randomly evolving propagation delays). Numerical results show that both models are able to accurately forecast the end-to-end delay over all tested datasets, with Random Forests outperforming Neural Networks with gaps as high as 40%.

L’importanza di avere reti di computer ad alte prestazioni nella nostra società è un argomento caldo nella comunità scientifica. La maggior parte delle aziende, informatiche e non, beneficerebbero di reti di computer il cui traffico è altamente ottimizzato, ed è qui che entra in gioco il paradigma Software Defined Networking (SDN), una piattaforma di controllo centralizzata il cui compito è quello di monitorare il traffico all’interno della rete. Questa tecnica può essere sfruttata per ottimizzare il traffico circolante tra i nodi; un esempio è l’aggiornamento delle tabelle di indirizzamento in funzione del traffico dei pacchetti che fluiscono all’interno della rete. Questa tecnica può essere migliorata attraverso l’utilizzo di modelli matematici che simulano il comportamento della rete, in modo da ottenere caratteristiche riguardanti la stessa ed il traffico, che richiederebbero simulazioni in tempo reale. È possibile derivare questi modelli matematici anche attraverso tecniche di Machine Learning; la combinazione tra SDN e modelli di Machine Learning dà vita ad una nuova tecnica, nota con il termine Knowledge Defined Networking (KDN). Un esempio di tale applicazione è un programma che simula il comportamento di una rete di computer, fornendo in output metriche di qualità del servizio (QoS, ovvero caratteristiche riguardo il servizio di consegna dei pacchetti all’interno della rete), data in input la condizione del traffico. Si stima che questa tecnica possa non solo facilitare il processo di predizione della qualità dell’instradamento, ma anche aumentarne di gran lunga le performance. È possibile identificare tanti altri casi di utilizzo, motivo per cui questo lavoro considera tale approccio estremamente utile. Questa tesi si focalizza sul problema di predirre metriche di qualità del servizio nel campo delle reti; in particolare, la metrica considerata è il cosiddetto end-to-end delay, fornendo in input al modello campioni della matrice di traffico. Per ottenere ciò, consideriamo diversi modelli di Machine Learning per caratterizzare le prestazioni in complesse reti di computer. Forniamo innanzitutto un programma, GEns-3, basato sul simulatore di reti di computer ns-3, nel quale ogni flusso Origine-Destinazione (OD) è caratterizzato da un misto di applicazioni UDP e TCP, in modo da generare dati utili per il nostro studio. Presentiamo tre datasets nei quali variamo gradualmente le seguenti caratteristiche della rete: intensità, capacità dei collegamenti e ritardo di propagazione. I datasets vengono utilizzati per imparare la corretta configurazione