Inverse reinforcement learning for water reservoirs

The problem of sequential decision making is of paramount importance in modern Artificial Intelligence. Reinforcement Learning (RL) is one of the most valuable approaches to tackle this kind of problem. One of the main assumptions on which it is based is that at every interaction with the environment we receive a scalar value that represents the reward for a specific state-action pair. In many real-world cases, however, defining a reward function which is able to encode a desired behaviour is a complex task but, at the same time, we can easily sample trajectories of an expert that interacts with the environment. Inverse Reinforcement Learning (IRL) is an approach devised to overcome this problem. The main goal consists of finding a reward function that makes the expert’s behaviour optimal. In this way, we have a concise and meaningful way to model the agent’s behaviour and its objectives. In this thesis, we employ IRL to tackle a specific real-world problem, that is the behaviour modeling of the dam operators across the USA. Indeed, dams control approximately 46% of large rivers worldwide and, therefore, the role of human decision should be further investigated to better understand its impact. The recent publication of ResOpsUS, a dataset where data about over 600 dams in the USA has been gathered, has enabled us to employ a truly batch model-free algorithm (Σ-GIRL) to model the human behaviour. The final goal of this thesis is, therefore, to produce a valid behavioural model by understanding what are the main objectives that each dam pursues only through the daily dam operations. Then, we also employ MI-Σ-GIRL, an algorithm devised to cluster the agents that pursue the same objectives. In the end, we show that our model successfully recognises the different objectives both in dams with a single objective and also in multipurpose dams.

Il problema della presa di decisioni sequenziali è di fondamentale importanza nella moderna Intelligenza Artificiale. L’apprendimento per rinforzo (RL) è uno degli approcci più validi per affrontare questo tipo di problema. Uno degli assunti principali su cui si basa è che a ogni interazione con l’ambiente riceviamo un valore scalare che rappresenta la ricompensa per una specifica coppia stato-azione. In molti casi reali, tuttavia, definire una funzione di ricompensa che sia in grado di codificare un comportamento desiderato è un compito complesso ma, allo stesso tempo, possiamo facilmente campionare le traiettorie di un esperto che interagisce con l’ambiente. L’Inverse Reinforcement Learning (IRL) è un approccio ideato per superare questo problema. L’obiettivo principale consiste nel trovare una funzione di ricompensa che renda ottimale il comportamento dell’esperto. In questo modo, abbiamo un metodo conciso e significativo per modellizzare il comportamento dell’agente e i suoi obiettivi. In questa tesi utilizzeremo IRL per affrontare un problema specifico, ovvero la modellizzazione del comportamento degli operatori delle dighe negli Stati Uniti. Infatti, le dighe controllano circa il 46% dei grandi fiumi di tutto il mondo e, pertanto, il ruolo delle decisioni umane dovrebbe essere ulteriormente studiato per comprenderne meglio l’impatto. La recente pubblicazione di ResOpsUS, un dataset in cui sono state raccolti dati su oltre 600 dighe negli Stati Uniti, ci ha permesso di utilizzare un algoritmo truly batch model-free (Σ-GIRL) per modellizzare il comportamento umano. L’obiettivo finale di questa tesi è, quindi, quello di produrre un modello comportamentale valido, comprendendo solo attraverso le operazioni quotidiane della diga quali siano gli obiettivi principali che ogni diga persegue. Successivamente utilizziamo anche MI-Σ-GIRL, un algoritmo ideato per raggruppare gli agenti che perseguono gli stessi obiettivi. Alla fine dimostriamo che il nostro modello è in grado di riconoscere con successo i diversi obiettivi sia nelle dighe con un unico obiettivo sia nelle dighe multifunzionali.