Hierarchical reinforcement learning (HRL) is a learning approach that stems from traditional reinforcement learning (RL). RL has the objective of teaching an actor, generally called agent, the best course of actions to approach a specific task. HRL comprises the same objective as RL, but it decomposes the task into multiple sub-tasks organized across different levels: this abstraction process has the purpose of distributing the complexity of the problem over multiple levels, thereby improving the scalability and flexibility of the framework. This thesis applies HRL to multi-drone mission planning tasks, where a team of simulated heterogeneous drones needs to destroy a set of targets according to a pre-defined engagement protocol, while also avoiding to be destroyed by adversary defenses. The aircraft therefore interact with the environment through movement actions and engagement actions, that are meant to modify the state of the enemies. The proposed hierarchy extends the feudal hierarchy approach, which traditionally consists of a two-level hierarchy where the higher level suggests sub-goals to be achieved by the lower level, to three levels. The results obatined through the use of HRL, along with deep RL techniques such as PPO to derive a behavioural policy, show that this approach is a viable solution to solve complex problems with large action spaces and a relatively long time-horizon.
Hierarchical Reinforcement Learning (HRL) è un metodo di apprendimento che deriva dal tradizionale Reinforcement Learning (RL). RL si pone l’obiettivo di insegnare ad un attore, generalmente chiamato agente, la migliore sequenza di azioni volta a risolvere una specifica mansione. HRL è caratterizzato dallo stesso obiettivo di RL, ma scompone la mansione in multiple sotto-mansioni organizzate attraverso diversi livelli: questa astrazione ha lo scopo di distribuire la complessità del problema su multipli piani, migliorando la scalabilità e la flessibilità rispetto al tradizionale RL. Questa tesi applica HRL alla pianificazione di missioni simulate in cui un team di droni eterogenei deve distruggere un set di targets secondo un modello di ingaggio predefinito ed evitando di essere a loro volta distrutti dalle difese nemiche. I droni, quindi, interagiscono con l’ambiente attraverso sia azioni di movimento, sia azioni volte ad influenzare lo stato dei nemici. La gerarchia proposta estende a tre livelli l’approccio gerarchico feudale, che è solitamente composto da una gerarchia a due livelli in cui il livello superiore suggerisce dei sotto-obiettivi che il livello inferiore deve raggiungere. I risultati ottenuti attraverso l’uso di HRL, insieme a tecniche di deep RL come PPO per derivare una politica comportamentale, mostrano che questo approccio è una valida soluzione per risolvere problemi complessi con spazi di azione considerevoli ed un orizzonte temporale relativamente lungo.
Multi-drones mission planning with hierarchical reinforcement learning
Vitobello, Andrea
2022/2023
Abstract
Hierarchical reinforcement learning (HRL) is a learning approach that stems from traditional reinforcement learning (RL). RL has the objective of teaching an actor, generally called agent, the best course of actions to approach a specific task. HRL comprises the same objective as RL, but it decomposes the task into multiple sub-tasks organized across different levels: this abstraction process has the purpose of distributing the complexity of the problem over multiple levels, thereby improving the scalability and flexibility of the framework. This thesis applies HRL to multi-drone mission planning tasks, where a team of simulated heterogeneous drones needs to destroy a set of targets according to a pre-defined engagement protocol, while also avoiding to be destroyed by adversary defenses. The aircraft therefore interact with the environment through movement actions and engagement actions, that are meant to modify the state of the enemies. The proposed hierarchy extends the feudal hierarchy approach, which traditionally consists of a two-level hierarchy where the higher level suggests sub-goals to be achieved by the lower level, to three levels. The results obatined through the use of HRL, along with deep RL techniques such as PPO to derive a behavioural policy, show that this approach is a viable solution to solve complex problems with large action spaces and a relatively long time-horizon.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Vitobello_Executive Summary_02.pdf
non accessibile
Descrizione: Executive Summary
Dimensione
900.16 kB
Formato
Adobe PDF
|
900.16 kB | Adobe PDF | Visualizza/Apri |
2024_04_Vitobello_Tesi_01.pdf
non accessibile
Descrizione: Tesi
Dimensione
5.02 MB
Formato
Adobe PDF
|
5.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/219153