The present study is focused on machine learning-based modeling of the residential energy expenditure in Germany and investigating the corresponding determinants. In this context, the micro-data of 42,226 households in Germany that includes four categories of parameters including those related to appliances, location, building properties, and socio-economic characteristics, has been utilized as the dataset. The households' expenditure corresponding to the electrical and thermal demand along with the total energy consumption has been considered as the estimation targets. For each pipeline, the performance of different machine learning models including Multi-Linear Regression (MLR), Random Forest (RF), Support Vector Machine (SVM), Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM), along with the combination of CNN and LSTM, has first been determined and compared. Next, in order to investigate the key determinants of residential energy expenditure, a hybrid forward feature selection procedure is performed. In the next step, by conducting a regression coefficient analysis, the effect of each selected parameter on each of the considered estimation targets is analyzed. Finally, in order to compare the influence of each parameter category, the achieved performance for each pipeline (utilizing the LSTM algorithm), while providing only one parameter category at a time, is investigated. The obtained results demonstrate that, for electrical and total energy expenditure estimation, the LSTM (with coefficients of determination (R2 scores) of 51.5% and 45.1% respectively) is the algorithm with the highest performance. For the estimation of thermal energy expenditure instead, the CNN (with an R2 score of 42.0%) is determined to be the most promising model. The range of achieved performance of these models is also shown to be in line with those reported by the majority of similar studies. Furthermore, it is observed that performing the feature selection procedure reduces the number of utilized features by 50%. Finally, it is demonstrated that the parameter category that involves the variables related to building properties is expectedly the most influential one, and only by utilizing these variables, the LSTM algorithm can estimate the electrical, thermal, and total energy expenditure, with R2 scores of 0.440, 0.425, and 0.361 respectively.
Il presente studio è incentrato sulla modellazione basata sull'apprendimento automatico della spesa energetica residenziale in Germania e sull'analisi dei relativi fattori determinanti. In questo contesto, sono stati utilizzati i microdati di 42.226 famiglie tedesche, che comprendono quattro categorie di parametri, tra cui quelli relativi agli elettrodomestici, all'ubicazione, alle proprietà dell'edificio e alle caratteristiche socio-economiche. La spesa delle famiglie corrispondente alla domanda elettrica e termica e il consumo energetico totale sono stati considerati come obiettivi di stima. Per ciascuna pipeline, sono state innanzitutto determinate e confrontate le prestazioni di diversi modelli di apprendimento automatico, tra cui la Multi-Linear Regression (MLR), Random Forest (RF), Support Vector Machine (SVM), Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM) e la combinazione di CNN e LSTM. Quindi, al fine di indagare i fattori determinanti del dispendio energetico residenziale, è stata eseguita una procedura ibrida di selezione dei parametri. Nella fase successiva, attraverso un'analisi dei coefficienti di regressione, viene analizzato l'effetto di ciascun parametro selezionato su ciascuno degli obiettivi di stima considerati. Infine, per confrontare l'influenza di ciascuna categoria di parametri, vengono analizzate le prestazioni ottenute per ciascuna pipeline (utilizzando l'algoritmo LSTM), fornendo solo una categoria di parametri alla volta. I risultati ottenuti dimostrano che, per la stima della spesa energetica elettrica e totale, l'algoritmo LSTM (con coefficienti di determinazione (R2 score) rispettivamente del 51,5% e del 45,1%) è quello con le prestazioni più elevate. Per la stima della spesa energetica termica, invece, la CNN (con un R2 score del 42,0%) risulta essere il modello più promettente. Anche l'intervallo di prestazioni raggiunte da questi modelli si dimostra in linea con quelle riportate dalla maggior parte degli studi simili. Inoltre, si osserva che l'esecuzione della procedura di selezione dei parametri riduce del 50% il numero di parametri utilizzati. Infine, si dimostra che la categoria di parametri che coinvolge le variabili relative alle proprietà dell'edificio è la più influente, e solo utilizzando queste variabili, l'algoritmo LSTM può stimare la spesa energetica elettrica, termica e totale, con R2 score rispettivamente di 0,440, 0,425 e 0,361.
Residential energy expenditure in Germany : machine learning-based modeling and investigation of the determinants
Alizadeh Sahzabi, Behnam
2021/2022
Abstract
The present study is focused on machine learning-based modeling of the residential energy expenditure in Germany and investigating the corresponding determinants. In this context, the micro-data of 42,226 households in Germany that includes four categories of parameters including those related to appliances, location, building properties, and socio-economic characteristics, has been utilized as the dataset. The households' expenditure corresponding to the electrical and thermal demand along with the total energy consumption has been considered as the estimation targets. For each pipeline, the performance of different machine learning models including Multi-Linear Regression (MLR), Random Forest (RF), Support Vector Machine (SVM), Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM), along with the combination of CNN and LSTM, has first been determined and compared. Next, in order to investigate the key determinants of residential energy expenditure, a hybrid forward feature selection procedure is performed. In the next step, by conducting a regression coefficient analysis, the effect of each selected parameter on each of the considered estimation targets is analyzed. Finally, in order to compare the influence of each parameter category, the achieved performance for each pipeline (utilizing the LSTM algorithm), while providing only one parameter category at a time, is investigated. The obtained results demonstrate that, for electrical and total energy expenditure estimation, the LSTM (with coefficients of determination (R2 scores) of 51.5% and 45.1% respectively) is the algorithm with the highest performance. For the estimation of thermal energy expenditure instead, the CNN (with an R2 score of 42.0%) is determined to be the most promising model. The range of achieved performance of these models is also shown to be in line with those reported by the majority of similar studies. Furthermore, it is observed that performing the feature selection procedure reduces the number of utilized features by 50%. Finally, it is demonstrated that the parameter category that involves the variables related to building properties is expectedly the most influential one, and only by utilizing these variables, the LSTM algorithm can estimate the electrical, thermal, and total energy expenditure, with R2 scores of 0.440, 0.425, and 0.361 respectively.| File | Dimensione | Formato | |
|---|---|---|---|
|
2022_07_Alizadeh_Sahzabi.pdf
non accessibile
Descrizione: Thesis text
Dimensione
2.45 MB
Formato
Adobe PDF
|
2.45 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/191833