Development of machine learning-based pipelines for user thermal preference modelling employing data on interactions with thermostats in a smart building

The present thesis proposes a machine learning(ML)-based approach for modelling the users' individual thermal preference in the context of smart buildings. The dataset that includes the users' interactions with the thermostat in 5 different rooms in a medical center's building has been utilized as the case study. In the first part, the thermostat offset that the user has switched the device to, in each interaction, has been defined as the estimation objective and the corresponding value has been categorized considering three different classes (Scenario A1) and 5 different classes (Scenario A2). In the second part, the timestamps in which the user has not had any interaction with the scenario are also considered and are taken into account as an additional class (user is satisfied), while considering the same mentioned scenarios for the rest of the classes (resulting in scenarios B1 and B2). The ML-based pipelines are thus implemented and trained aiming at estimating the above-mentioned user thermal preference classes, while being provided with the room temperature, outdoor temperature, solar irradiation, previously imposed setpoint and existing offset (in the corresponding timestamp) as input features. Class balancing and resampling techniques are also employed to handle the issue of data imbalance and 5 different algorithms are utilized to perform the classification task. A feature selection procedure is also carried out and the most relevant features are identified for each case in order to simplify the pipelines. The obtained results demonstrate that the most promising algorithms for classifying the user's thermal behaviour are Catboost (CB) and Random Forests (RF). Expectedly, the classification performance achieved for scenarios B1 and B2 is shown to be higher than the one obtained for scenarios A1 and A2. The synthetic minority oversampling technique has also shown to be a suitable method to handle the notable imbalance in the data (e.g. the dataset employed in scenarios B1 and B2), which has resulted in an elevated accuracy for both CB and RF classifiers achieving accuracy and F1 scores higher than 99%. Therefore, it is demonstrated that it is possible to employ the data obtained from the occupant's interactions with the thermostats as the user feedback (without utilizing any additional device) in order to implement HVAC control strategies that are based on personalized thermal preference.

La presente tesi propone un approccio basato sul machine learning (ML) per modellare le preferenze termiche individuali degli utenti nel contesto degli edifici intelligenti. Come caso di studio è stato utilizzato un set di dati che comprende le interazioni degli utenti con il termostato in 5 diverse stanze dell'edificio di un centro medico. Nella prima parte, l'offset del termostato su cui l'utente ha impostato il dispositivo, in ogni interazione, è stato definito come obiettivo di stima e il valore corrispondente è stato categorizzato considerando tre diverse classi (Scenario A1) e 5 diverse classi (Scenario A2). Nella seconda parte, vengono considerati anche i timestamp in cui l'utente non ha avuto alcuna interazione con lo scenario e vengono presi in considerazione come classe aggiuntiva (utente soddisfatto), mentre si considerano gli stessi scenari menzionati per il resto delle classi (risultanti negli scenari B1 e B2). Le pipeline basate su ML vengono quindi implementate e addestrate con l'obiettivo di stimare le classi di preferenza termica dell'utente sopra menzionate, fornendo come caratteristiche di input la temperatura ambiente, la temperatura esterna, l'irradiazione solare, il setpoint precedentemente imposto e l'offset esistente (nel timestamp corrispondente). Vengono inoltre impiegate tecniche di bilanciamento e ricampionamento delle classi per gestire il problema dello sbilanciamento dei dati e vengono utilizzati 5 diversi algoritmi per eseguire il compito di classificazione. Viene inoltre eseguita una procedura di selezione delle caratteristiche e vengono identificate le caratteristiche più rilevanti per ogni caso, al fine di semplificare le pipeline. I risultati ottenuti dimostrano che gli algoritmi più promettenti per classificare il comportamento termico dell'utente sono Catboost (CB) e Random Forests (RF). Come previsto, le prestazioni di classificazione ottenute per gli scenari B1 e B2 sono superiori a quelle ottenute per gli scenari A1 e A2. La tecnica di sovracampionamento delle minoranze sintetiche si è dimostrata un metodo adatto a gestire il notevole squilibrio dei dati (ad esempio, il set di dati utilizzato negli scenari B1 e B2), che ha portato a un'elevata accuratezza per entrambi i classificatori CB e RF, raggiungendo punteggi di accuratezza e F1 superiori al 99%. È stato quindi dimostrato che è possibile utilizzare i dati ottenuti dalle interazioni degli occupanti con i termostati come feedback dell'utente (senza utilizzare alcun dispositivo aggiuntivo) per implementare strategie di controllo HVAC basate su preferenze termiche personalizzate.