Large Language Models (LLMs) have become integral components in various applications, ranging from natural language understanding to content generation. As their significance grows, so does the need to scrutinize and evaluate their responses due to potential risks such as private data leaks and the generation of inappropriate, harmful, or misleading content. This master's thesis delves into the assessment of Large Language Models responses, introducing key metrics such as correctness, coherence, relevance, and completeness. Recognizing that the quality of responses is intricately linked to the nature of the questions posed to Large Language Models, we explore the concept of prompt engineering as a crucial factor influencing response quality. To conduct a comprehensive evaluation, we utilize multi-domain datasets as well as domain-specific ones, encompassing both open-answers and multiple-choice formats, in order to ensure an unbiased and generalized evaluation. Our analysis reveals intriguing findings, demonstrating that the effectiveness of prompt engineering is not universal. While in certain cases, optimizing prompts enhances response quality, in others, it introduces unnecessary complexity, leading to inferior results.
I Large Language Models (LLMs) hanno acquisito un ruolo fondamentale in una vasta gamma di applicazioni, che spaziano dalla comprensione del linguaggio naturale alla generazione di contenuti. Con l'aumentare della loro importanza, diventa sempre più urgente esaminare e valutare le risposte da essi generate, considerando i potenziali rischi quali la divulgazione di dati privati e la generazione di contenuti inappropriati, dannosi o fuorvianti. L'obiettivo della tesi è di approfondire la valutazione delle risposte dei Large Language Models, introducendo metriche chiave quali la correttezza, la coerenza, la pertinenza e la completezza. Considerando che la qualità delle risposte è intrinsecamente connessa al tipo di domande rivolte ai Large Language Models, esaminiamo l'importanza del prompt engineering come fattore cruciale che incide sulla qualità delle risposte. Per condurre una valutazione completa, impieghiamo sia dataset con domande a risposta multipla, ideali per esaminare la correttezza, sia dataset con domande a risposta aperta, utili per studiare aspetti come la coerenza, la pertinenza e la completezza delle risposte. La nostra analisi rivela risultati interessanti, dimostrando che l'efficacia del prompt engineering non è universale. Mentre in alcuni casi l'ottimizzazione dei prompt migliora la qualità delle risposte, in altri introduce una complessità non necessaria, portando a risultati sub-ottimali.
Prompt engineering's influence on result reliability in large language models
della Volpe, Nicola
2022/2023
Abstract
Large Language Models (LLMs) have become integral components in various applications, ranging from natural language understanding to content generation. As their significance grows, so does the need to scrutinize and evaluate their responses due to potential risks such as private data leaks and the generation of inappropriate, harmful, or misleading content. This master's thesis delves into the assessment of Large Language Models responses, introducing key metrics such as correctness, coherence, relevance, and completeness. Recognizing that the quality of responses is intricately linked to the nature of the questions posed to Large Language Models, we explore the concept of prompt engineering as a crucial factor influencing response quality. To conduct a comprehensive evaluation, we utilize multi-domain datasets as well as domain-specific ones, encompassing both open-answers and multiple-choice formats, in order to ensure an unbiased and generalized evaluation. Our analysis reveals intriguing findings, demonstrating that the effectiveness of prompt engineering is not universal. While in certain cases, optimizing prompts enhances response quality, in others, it introduces unnecessary complexity, leading to inferior results.File | Dimensione | Formato | |
---|---|---|---|
2024_04_della Volpe_Tesi_01.pdf
solo utenti autorizzati dal 19/03/2025
Descrizione: Testo della tesi
Dimensione
2.61 MB
Formato
Adobe PDF
|
2.61 MB | Adobe PDF | Visualizza/Apri |
2024_04_della Volpe_Executive Summary_02.pdf
solo utenti autorizzati dal 19/03/2025
Descrizione: Testo dell'executive summary
Dimensione
621.78 kB
Formato
Adobe PDF
|
621.78 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/218498