Comparative analysis of RF and NN for fault classification and regression in distribution networks

In modern power systems, it is essential to quickly identify and isolate the faults to maintain reliability and safety. This thesis focuses on data-driven techniques for short circuit fault analysis in distribution systems, focusing mainly on Machine Learning approaches for fault classification and fault location estimation. The fault simulations are performed using DIgSILENT Power Factory on two feeders covering four major fault types; the feeders that we selected came from a real case study considering the grid of Milan. The Machine Learning models, Random Forest (RF) and Multi-layer perception (MLP) neural networks are considered and trained on diverse set of features covering Discrete wavelet transforms (DWT), fast Fourier transforms (FFT), sequence components, statistical domain descriptors, Phase energy ratios etc to capture both transient and steady state characteristics of a fault. The models are evaluated under ideal and noisy cases. Gaussian noise with varied SNR levels was added to test robustness. Extensive experiments are performed, including cross-condition tests (training and testing on different fault types with different SNR levels). The model’s performance is assessed by accuracy and F1 score, while the location is estimated using F1 Score, mean absolute error (MAE), and coefficient of determination (R2). These results confirm that RF and MLP models achieve high fault classification accuracy and precise fault location estimation. The inclusion of transient features, such as high-frequency signal components that occur immediately after a disturbance or a fault in the power system, made a huge impact in distinguishing fault types and pinpointing their location, as they contain rich diagnostic information. SHAP analysis is included to rank the importance of a feature to generate counterfactual explanations for the model predictions. This thesis focuses mainly on the high accuracy and the speed of ML-based fault diagnosis, the benefits of multi-domain feature engineering to capture fault signatures, and practical considerations to deploy such models in the noisy and real-world distribution networks. Overall, the study demonstrates the comparative analysis between the RF and MLP models, highlighting their strengths in fault classification and localization, and offering insights into selecting the model for real-world deployment. We achieved around 100% accuracy in ideal conditions for RF for a long and short feeder. In contrast, NN achieved 99% accuracy for a short feeder and 96% accuracy in detecting a fault for a long feeder. Both models achieved 98-99% accuracy in fault location estimation. Keywords: Fault diagnosis, regression classification, Random Forest, Neural Networks, Noisy conditions, multi-class classification, Discrete wavelet transform, RF and MLP comparison.

Nei moderni sistemi elettrici, è essenziale identificare e isolare rapidamente i guasti per mantenere affidabilità e sicurezza. Questa tesi si concentra su tecniche basate sui dati per l’analisi dei guasti di cortocircuito nei sistemi di distribuzione, focalizzandosi principalmente su approcci di Machine Learning per la classificazione dei guasti e la stima della loro localizzazione. Le simulazioni dei guasti sono state eseguite utilizzando DIgSILENT Power Factory su due linee di distribuzione, coprendo quattro tipi principali di guasto; le linee selezionate provengono da un caso di studio reale, considerando la rete di Milano. I modelli di Machine Learning considerati sono Random Forest (RF) e reti neurali a perceptrone multistrato (MLP), addestrati su un insieme diversificato di caratteristiche, tra cui trasformate wavelet discrete (DWT), trasformate di Fourier veloci (FFT), componenti di sequenza, descrittori statistici di dominio, rapporti di energia di fase, ecc., per catturare sia le caratteristiche transitorie che quelle stazionarie di un guasto. I modelli sono valutati in condizioni ideali e rumorose. Rumore gaussiano con diversi livelli di SNR è stato aggiunto per testarne la robustezza. Sono stati condotti esperimenti approfonditi, inclusi test in condizioni incrociate (addestramento e test su diversi tipi di guasto con diversi livelli di SNR). Le prestazioni del modello sono valutate tramite accuratezza e F1 score, mentre la localizzazione dei guasti è stimata tramite F1 score, errore assoluto medio (MAE) e coefficiente di determinazione (R2). I risultati confermano che i modelli RF e MLP raggiungono un’elevata accuratezza nella classificazione dei guasti e una precisa stima della loro posizione. L'inclusione di caratteristiche transitorie, come componenti del segnale ad alta frequenza che si verificano immediatamente dopo un disturbo o un guasto nel sistema elettrico, ha avuto un impatto significativo nel distinguere i tipi di guasto e individuarne la posizione, poiché contengono informazioni diagnostiche preziose. È stata inclusa un'analisi SHAP per classificare l'importanza delle caratteristiche e generare spiegazioni controfattuali delle previsioni del modello. Questa tesi si concentra principalmente sull’elevata accuratezza e velocità della diagnosi dei guasti basata su ML, sui benefici dell’ingegneria delle caratteristiche multidominio per catturare le firme dei guasti e sulle considerazioni pratiche per implementare tali modelli nelle reti di distribuzione rumorose e reali. In generale, lo studio presenta un’analisi comparativa tra i modelli RF e MLP, evidenziandone i punti di forza nella classificazione e localizzazione dei guasti, e offrendo indicazioni utili per la scelta del modello da adottare in ambienti reali. Abbiamo raggiunto circa il 100% di accuratezza in condizioni ideali per RF su linee lunghe e corte. In confronto, la rete neurale ha raggiunto il 99% di accuratezza su linee corte e il 96% nell’individuare un guasto su linee lunghe. Entrambi i modelli hanno ottenuto un’accuratezza del 98–99% nella stima della localizzazione del guasto.