BTGenBot: Behavior Tree Generation for Robotic Tasks with Lightweight LLMs

This thesis presents a novel approach to generating behavior trees for robots using lightweight large language models (LLMs) with a maximum of 7 billion parameters. The study demonstrates that it is possible to achieve satisfying results with compact LLMs when fine-tuned on a specific dataset. The key contributions of this research include the creation of a fine-tuning dataset based on existing behavior trees using GPT-3.5, a custom-developed validation system, a comprehensive comparison of multiple LLMs (namely llama2, llama-chat, and code-llama) across nine distinct tasks, and a complete pipeline to execute the behavior trees directly on a real robot. To be thorough, we evaluated the generated behavior trees with a set of metrics, including a syntactical and semantical analysis, a custom validation system, a simulated environment, and a real robot. We designed the custom validation system using ROS2 to execute predefined behavior trees, track their outcomes and log the results. In addition, we explored the potential of compact LLMs in evaluating generated behavior trees. Finally, we tested the behavior trees on a physical robot using our ROS2-based system, which parses the trees and executes a corresponding ROS action for each tree node. This work opens the possibility of deploying such solutions directly on the robot, enhancing its practical applicability. Findings from this thesis demonstrate the potential of LLMs with a limited number of parameters in generating effective and efficient robot behaviors.

Questa tesi presenta un approccio innovativo sulla generazione di behavior tree per i robot, utilizzando large language model (LLM) leggeri con un massimo di 7 miliardi di parametri. Lo studio dimostra che è possibile ottenere risultati soddisfacenti con LLM compatti, se ottimizzati su un dataset specifico. I contributi chiave di questa ricerca includono la creazione di un dataset di ottimizzazione basato su behavior tree esistenti utilizzando GPT-3.5, un sistema di validazione sviluppato appositamente, un confronto completo di più LLM (in particolare llama2, llama-chat e code-llama) su nove task distinti e una pipeline completa per eseguire i behavior tree direttamente su un robot reale. Per completezza, abbiamo valutato i behavior tree generati con una serie di metriche che includono un'analisi sintattica e semantica, un sistema di validazione personalizzato, un ambiente simulato e un robot reale. Abbiamo progettato il sistema di validazione personalizzato utilizzando ROS2 per eseguire behavior tree predefiniti, tracciarne gli esiti e registrare i risultati. Inoltre, abbiamo esplorato il potenziale di LLM compatti nella valutazione dei behavior tree generati. Infine, abbiamo testato i behavior tree su un robot fisico utilizzando il nostro sistema basato su ROS2, che analizza gli alberi ed esegue un'azione ROS corrispondente per ogni nodo dell'albero. Questo lavoro apre la possibilità di implementare tali soluzioni direttamente sul robot, migliorandone l'effettiva applicabilità. I risultati di questa tesi dimostrano il potenziale degli LLM con un numero limitato di parametri nel generare comportamenti efficaci ed efficienti per i robot.