Software design with large language models. Analysis and optimisation of domain-specific models generation techniques

The goal of the thesis is to investigate if Large Language Models (LLMs) can be used for automatising the Software Design phase, focusing specifically on the generation of an Interaction Flow Modeling Language (IFML) model that represents the user interface of a front-end application. In particular, the work starts by presenting an overview of the current state of the research in this field, highlighting the major problems pointed out by the available literature and outlining the various ways in which it is possible to present the same problem to a LLM, with the purpose of finding the most effective techniques to instruct it in order to achieve the best results (this is what is called "Prompt Engineering"). After that, the work moves to the experimental part, that has two objectives: trying to generate the translation of a given domain specification into an IFML model by using a set of different techniques in order to analyse the differences of the obtained results and repeat the experiments by requiring some variants of the original model with an increasing level of complexity, in order to study the variation of the number of errors with respect to the model dimension and thus check if the performance remains stable. The evaluation was mainly automated to ensure scientific validity through a large number of tests. However, recognising the limitations of automatic evaluation, discussed primarily in the final section of the thesis, a qualitative manual evaluation was also conducted on a small sample of experiments to provide a more reliable assessment of results. The results indicate that each technique showed strengths and weaknesses depending on the specific error category and the complexity of the model under test; furthermore, the relative error rate seems to remain more or less stable when increasing the model complexity, meaning that there is a linear proportion between dimension of the model and number of errors.

L'obiettivo della tesi è quello di studiare se i Large Language Models (LLMs) possono essere usati per automatizzare la fase di progettazione del software, focalizzandosi in particolare sulla generazione di un modello IFML (Interaction Flow Modeling Language) che rappresenti l'interfaccia utente di una applicazione front-end. Il lavoro inizia presentando una panoramica dell'attuale stato della ricerca in questo campo, evidenziando le maggiori problematiche sottolineate dalla letteratura disponibile e delineando le varie modalità in cui è possibile porre lo stesso problema ad un LLM, con l'obiettivo di trovare le tecniche più efficaci per dare istruzioni al LLM al fine di ottenere i migliori risultati (ciò prende il nome di "Prompt Engineering"). Successivamente, il lavoro si sposta alla parte sperimentale, che ha due obiettivi: provare a generare la traduzione delle specifiche di un dato dominio in un modello IFML usando una serie di differenti tecniche per analizzare le differenze tra i risultati ottenuti e ripetere gli esperimenti richiedendo diverse varianti del modello originale con un livello crescente di complessità, con l'obiettivo di studiare la variazione nel numero di errori in relazione alla dimensione del modello e controllare quindi se le performance rimangono stabili. La valutazione è stata prevalentemente automatizzata per assicurare validità scientifica ai risultati attraverso un ampio numero di test. Tuttavia, riconoscendo le limitazioni di una valutazione automatica, discusse soprattutto nella sezione conclusiva della tesi, è stata condotta anche una valutazione qualitativa manuale su un piccolo campione di esperimenti per fornire una valutazione più affidabile dei risultati. I risultati indicano che ogni tecnica ha punti di forza e di debolezza a seconda della specifica categoria di errore e della complessità del modello sotto esame; inoltre, il tasso di errore relativo sembra rimanere più o meno stabile al crescere della complessità del modello, il che significa che c'è una proporzione lineare tra la dimensione del modello ed il numero di errori.