Federated learning traffic detection: early steps and models performance evaluation

Federated Learning (FL) is a privacy-preserving Machine Learning (ML) approach that enables collaborative model training without sharing raw data. However, its distributed nature and the handling of sensitive information make FL clients susceptible to malicious attacks. The FL process involves clients and a Parameter Server (PS) to exchange model parameters over the network, generating unique traffic patterns that can be identified. This research studies the detection and classification of FL traffic in a network to improve data transmission efficiency and client security, serving as the base model for anomaly detection systems. We created a dataset of FL network traces by simulating different FL training setups based on Flower, an open-source FL framework. We used a Platform as a Service based on the Flower framework developed using AWS and Google cloud services for our simulations. We then developed various classifiers, including Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), and deep learning models, such as Convolutional Neural Networks (CNNs) and Long Short Term Memorys (LSTMs), and compared their performance. Results show the effectiveness of these models, achieving high accuracy in classifying FL traffic from non-FL traffic and highlighting the importance of feature selection and the number of packets per flow used for the classification. By enhancing network traffic management and security, this work helps to develop robust and reliable FL platforms. Additionally, our findings provide a foundation for future studies on optimizing FL frameworks and strengthening their resilience against network-based threats.

Il Federated Learning (FL) è un metodo di Machine Learning (ML) che preserva la privacy, consentendo di addestrare modelli in modo collaborativo senza condividere i dati grezzi. Tuttavia, la sua natura distribuita e la gestione di informazioni sensibili, rendono i client di FL vulnerabili a potenziali attacchi malevoli. Il processo di FL coinvolge diversi client e un server di aggregazione per scambiare i parametri del modello sulla rete, generando traffico con specifiche caratteristiche che possono essere rilevate. Questa ricerca studia l'identificazione e la classificazione del traffico di rete FL. Questo permette di migliorare l'efficienza della trasmissione dei dati e la sicurezza dei client, costituendo il modello di base per i sistemi di rilevamento delle anomalie. Abbiamo creato un dataset di tracce di rete di FL simulando diverse configurazioni di sistemi di FL basati su Flower, un framework FL open-source. Per le nostre simulazioni abbiamo utilizzato una piattaforma come servizio basata sul framework Flower e sui servizi cloud di AWS e Google. Abbiamo quindi sviluppato vari classificatori, tra cui Random Forest (RF) e eXtreme Gradient Boosting (XGBoost), e modelli di deep learning, come le Convolutional Neural Networks (CNNs) e le Long Short Term Memorys (LSTMs), confrontato successivamente le loro prestazioni. I risultati mostrano la validità di questi modelli, raggiungendo un'alta accuratezza nella classificazione del traffico federato rispetto a quello non federato ed evidenziando l'importanza della selezione delle caratteristiche e del numero di pacchetti per flusso usati per la classificazione. Migliorando la gestione del traffico di rete e la sicurezza, questo lavoro contribuisce allo sviluppo di piattaforme FL robuste e affidabili. Inoltre, i nostri risultati forniscono una base per studi futuri sull'ottimizzazione dei framework FL e sul potenziamento della loro affidabilità contro le minacce basate sulla rete.