Generative AI for customer support: an empirical investigation of RAG-Based Chatbot Architectures in an ERP scenario

The thesis investigates the development of a customer support chatbot in an Enterprise Resource Planning (ERP) system to reduce help desk workload and enhance user satisfaction. The main objectives are to decrease support tickets by 20%, achieve a user satisfaction rate of 80% or higher, and minimize human error in ticket creation, while ensuring high performance and efficiency. The solution development process is based on a progressive, data-driven methodology, and it employs off-the-shelf Large Language Models (LLMs), but given that these models are not trained on company-specific data, they need to be augmented with other techniques, such as Retrieval-Augmented Generation (RAG) and Prompting, to effectively integrate the knowledge base and tailor the model to the task. The selected LLMs leverage extensive pretraining on diverse datasets, providing a strong understanding of natural language, including Italian, which avoids the need for resource-intensive model training from scratch. Instead, the focus is on combining these models with the use of external private knowledge and carefully crafted instructions to achieve the desired quality of the chatbot; thus, this thesis is based on the development of a RAG. The evaluation of the efficacy of the chatbot posed a great challenge in the absence of a real evaluation dataset. A part of this work is focused on creating an evaluation framework, more precisely a pipeline to synthetically generate an evaluation set, using LLMs inference and LLM as a judge. The results will be evaluated using a set of carefully selected metrics (ROUGE-1, BERT-F1 score, and SemScore), applied to the synthetically generated dataset. These metrics, alongside human feedback will assess the quality of the chatbot's responses, to identify the optimal RAG-based chatbot solution. This thesis contributes to ongoing research on RAG-based chatbot assistants, balancing performance and implementation complexity, iteratively improving the system, and evaluating its cost-effectiveness trade-off, using both automatic metrics and human feedback.

Questa tesi esplora lo sviluppo di un chatbot per il supporto clienti in un sistema ERP, con l'obiettivo di ridurre il carico di lavoro dell’help desk e migliorare la soddisfazione degli utenti. Gli obiettivi principali sono: ridurre la quantità di ticket del 20%, ottenere un tasso di soddisfazione maggiore del 80% e minimizzare gli errori umani durante la creazione dei ticket, garantendo allo stesso tempo alte prestazioni ed efficienza. Il sistema si basa su una metodologia progressiva e guidata dai dati, utilizzando LLM preaddestrati. Poiché questi modelli non sono allenati sui specifici dati aziendali, i modelli devono essere integrati con altre techniche, come Retrieval-Augmented Generation (RAG) e Prompting per aggiungere la base dati e adattare il sistema al compito desiderato. I LLM selezionati sfruttano un pretraining su dataset diversificati, garantendo una solida comprensione del linguaggio naturale, incluso l'italiano. Questo approccio evita addestramenti costosi, concentrandosi sulla combinazione di conoscenze esterne e istruzioni mirate con le capacità dei modelli, per migliorare la qualità del chatbot, quindi questa tesi si basa sullo sviluppo di un sistema RAG. La valutazione del chatbot è complessa in assenza di un dataset reale. Per questo, è stata sviluppata una pipeline di valutazione sintetica, sfruttando LLM per la generazione e come giudici. I risultati vengono analizzati con metriche accuratamente selezionate (ROUGE-1, BERT-F1 e SemScore), applicate a un dataset generato sinteticamente. Queste metriche, insieme al feedback umano, valuteranno la qualità delle risposte per identificare la soluzione RAG ottimale. Questa tesi contribuisce alla ricerca sui chatbot RAG, bilanciando prestazioni e complessità implementativa, migliorando iterativamente il sistema e valutando il compromesso tra costo ed efficacia, tramite metriche automatiche e feedback umano.