Machine Learning methods for clustering and day-ahead load forecast of thermal power plants

The ever-increasing interest in energy efficiency and the onerous amount of data involved makes it necessary to adopt a method that aims to improve the management of a series of buildings connected to a thermal utility. This thesis work aims to provide a method based on unsupervised clustering to group utilities, and then neural networks are used to predict consumption in difficult scenarios of training. The variables that are collected from the smart meters are the basis of the analysis, together with weather parameters. The proposed method has been based on a real working District Heating and therefore, completely real data are used, which are affected by interruptions and missed readings. Therefore, a methodology of data-cleaning, pre-processing and post-processing of data is proposed. The goal of the work is to predict utility consumption through machine learning methods (neural networks) and have been adopted 3 methods for clustering: k-means. hierarchical clustering and DBSCAN. This latter was appropriate for the analyzed case study and hierarchical clustering was found to be the most reliable. Hierarchical clustering was found to be the most reliable and with the most convincing results, according to the indices used to assess the goodness of clustering. In order to predict the required thermal energy consumption, 3 different strategies were adopted, namely: training a neural network for all utilities, neural network one for each cluster, and finally adopting a neural network for each utility. The last methodology is the one that showed better results, but not very far from the second strategy, which could prove to be successful if there is an increase in data and utilities analyzed.

Il crescente interesse per l'efficienza energetica e l'onerosa quantità di dati coinvolti rende necessaria l'adozione di un metodo che miri a migliorare la gestione di una serie di edifici collegati a un'utenza termica. Questo lavoro di tesi si propone di fornire un metodo basato sul clustering non supervisionato per classificare le utenze e successivamente utilizzare le reti neurali per prevedere i consumi. Le variabili raccolte dagli smart meters sono alla base dell'analisi, insieme ai dati meteorologici. Il metodo proposto è basato su una vera e propria rete di teleriscaldamento funzionante e quindi vengono utilizzati dati del tutto reali, che quindi sono affetti da buchi e mancate letture, per cui viene eseguita una di pulizia dei dati preliminare e successivamente una post-elaborazione dei dati. L'obiettivo del lavoro è quello di prevedere i consumi attraverso metodi di machine learning come le reti neurali. È stata effettuata una ricerca tra 3 strategie di clustering: k-means, clustering gerarchico e DBSCAN, per stabilire quale sia la più appropriata per il seguente caso di studio. Il clustering gerarchico è risultato essere il più affidabile e con i risultati più convincenti, sulla base degli indici utilizzati per valutare la bontà del raggruppamento. Infine, per poter prevedere il consumo di energia termica richiesto, sono state adottate tre diverse strategie: (1) l'addestramento di una rete neurale per tutte le utenze, (2) l'utilizzo di una rete neurale per ogni cluster e infine (3) l'adozione di una rete neurale per ogni utenza. Analizzando i risultati risulta evidente come la metodologia 3 sia quella che ha portato a risultati migliori, tuttavia non molto distanti dalla strategia 2, la quale potrebbe rivelarsi vincente in caso di aumento dei dati e delle utenze analizzate.