An empirical study on how software testing students interact with large language models

Software testing is one of the most challenging topics to teach in software engineering, as students often perceive it as tedious and repetitive. In recent years, gamification has been introduced in software testing education to enhance students' engagement. For example, the Code Defenders platform introduces a gamified environment for learning mutation testing. However, while gamification deals with students' lack of motivation, it does not address all the difficulties they may face while applying testing concepts in practice. The advent of Large Language Models (LLMs) presents an opportunity to tackle this problem. LLMs are becoming popular in software engineering education as they can assist educators by generating teaching materials and support students by providing explanations for unclear topics. However, LLMs also pose a risk to education, as students may use them to generate ready-made solutions and cheat on assignments. This research investigates the impact of LLMs on software testing students by observing their interactions with an unrestricted LLM-based tool. Specifically, this work developed a GPT-based smart assistant, named AI Defenders, which allows students to ask software testing questions while playing games of Code Defenders. An empirical study was then conducted with software testing students from an academic course to understand how they use the assistant while practising mutation testing. The empirical study reveals that students have unrealistic expectations about the assistant, ask very simple questions, and frequently struggle to use the generated responses effectively when writing tests. Additionally, many students attempt to use the assistant to directly generate complete solutions for their testing tasks. As a result, the study observes that the assistant is beneficial for only a small percentage of students who are able to use it correctly. On average, students' performance decreases while using the assistant. These findings help raise awareness about the risks of introducing unrestricted LLM-based tools in educational contexts and emphasize the need to train students in their usage.

L'insegnamento del software testing rappresenta una delle maggiori sfide nell'ambito educativo dell'ingegneria del software. Negli ultimi anni, la tecnica della gamification è stata frequentemente utilizzata al fine di accrescere l'interesse degli studenti verso questo tema, spesso percepito come noioso e ripetitivo. Un rilevante esempio a tal proposito è dato dalla piattaforma Code Defenders che introduce un ambiente di gioco finalizzato all'insegnamento del mutation testing. Sebbene la gamification incentivi gli studenti nello studio del software testing, questa tecnica non permette di risolvere le difficoltà che essi incontrano durante l’apprendimento della materia. I Large Language Models (LLM) costituiscono un possibile strumento per affrontare questo problema. Essi sono, infatti, sempre più popolari in ambito educativo perché permettono ai docenti di generare materiale didattico con facilità e agli studenti di ottenere dettagliate spiegazioni e chiarimenti durante lo studio. Tuttavia, l’uso di LLM presenta anche dei rischi, in quanto gli studenti potrebbero sfruttare tali strumenti per generare soluzioni di compiti ed esami. Questa tesi investiga l’impatto che i LLM possono avere sull’apprendimento del software testing osservando e analizzando le interazioni degli studenti con tali modelli. In particolare, questa ricerca descrive lo sviluppo di AI Defenders, un assistente intelligente, basato sui modelli GPT, che permette agli studenti di porre domande sul software testing mentre utilizzano Code Defenders. La ricerca include poi uno studio empirico condotto con studenti universitari di software testing per determinare come essi utilizzino l’assistente intelligente. Questo studio rivela che gli studenti hanno aspettative irrealistiche riguardo l’assistente, pongono domande poco dettagliate e spesso faticano ad utilizzare le rispose generate dal modello in maniera efficace. Inoltre, molti studenti tentano di utilizzare AI Defenders per ottenere direttamente la soluzione completa degli esercizi, invece che delle linee guida per risolverli. Di conseguenza, l’introduzione dell’assistente intelligente mostra benefici solo per una percentuale ridotta di studenti e le prestazioni medie degli studenti che utilizzano l’assistente risultano peggiori di quelle degli studenti che non lo utilizzano. L’obiettivo di questa ricerca è quindi quello di mettere in guardia rispetto ai rischi che gli strumenti basati su LLM possono introdurre in ambito educativo se usati in maniera incontrollata e senza supervisione. Le osservazioni riportate da questo studio suggeriscono, inoltre, la necessità di formare gli studenti affinché utilizzino correttamente tali strumenti.