The rapid growth of European Union energy policy, driven by the Green Deal, has generated a vast corpus of complex legislative texts that are difficult to analyze, navigate, and reuse automatically. To facilitate regulatory analysis, compliance, and decision-making, it is essential to develop tools capable of transforming legal text into structured and interpretable representations. However, existing automated approaches face significant limitations: traditional NLP tools struggle with the rigid hierarchical structure of these documents, while LLMs often lack the necessary grounding, leading to hallucinations and context loss. This work proposes \textbf{\nomeKG}, a methodology for constructing structured and semantically grounded Knowledge Graphs from EUR-Lex Directives. The approach integrates three complementary layers: the document structure, the inter- and intra-document legal references, and the ontology-driven semantic layer aligned with legal and energy-domain ontologies. The framework is evaluated on the Energy Efficiency Directive (EU) 2023/1791, resulting in a Knowledge Graph with over 98,000 triples. Experimental results show near-perfect recall in structural and reference extraction and a semantic precision of 78.9\%, assessed through an automated ``LLM-as-a-Judge'' validation. The resulting Knowledge Graph provides a reliable and interpretable foundation for downstream applications such as Graph-based Retrieval-Augmented Generation and Question Answering.
La rapida crescita delle politiche energetiche dell’Unione Europea, guidata dal Green Deal, ha generato un vasto corpus di testi legislativi complessi, difficili da analizzare, navigare e riutilizzare automaticamente. Per facilitare l’analisi normativa, la verifica di conformità e i processi decisionali, è essenziale sviluppare strumenti in grado di trasformare il testo giuridico in rappresentazioni strutturate e interpretabili. Tuttavia, gli approcci automatizzati esistenti presentano limitazioni significative: i tradizionali strumenti di NLP faticano a gestire la rigida struttura gerarchica di questi documenti, mentre i Large Language Models (LLM) spesso mancano del necessario grounding, dando luogo a fenomeni di allucinazione e perdita di contesto. Questo lavoro propone TREES-KG, una metodologia per la costruzione di Knowledge Graph strutturati e semanticamente fondati a partire dalle Direttive EUR-Lex. L’approccio integra tre livelli complementari: la struttura del documento, i riferimenti giuridici intra- e inter-documento e il livello semantico guidato da ontologie, allineato con ontologie giuridiche e del dominio energetico. Il framework è valutato sulla Direttiva sull’Efficienza Energetica (UE) 2023/1791, producendo un Knowledge Graph con oltre 98.000 triple. I risultati sperimentali mostrano un richiamo quasi perfetto nell’estrazione strutturale e dei riferimenti e una precisione semantica del 78,9\%, valutata tramite una procedura automatizzata di validazione “LLM-as-a-Judge”. Il Knowledge Graph risultante fornisce una base affidabile e interpretabile per applicazioni downstream quali la Retrieval-Augmented Generation basata su grafi e il Question Answering.
LLM-based knowledge graph construction for the legal sector: modeling semantics, document structure, and legal references
RIVA, MARTINA
2024/2025
Abstract
The rapid growth of European Union energy policy, driven by the Green Deal, has generated a vast corpus of complex legislative texts that are difficult to analyze, navigate, and reuse automatically. To facilitate regulatory analysis, compliance, and decision-making, it is essential to develop tools capable of transforming legal text into structured and interpretable representations. However, existing automated approaches face significant limitations: traditional NLP tools struggle with the rigid hierarchical structure of these documents, while LLMs often lack the necessary grounding, leading to hallucinations and context loss. This work proposes \textbf{\nomeKG}, a methodology for constructing structured and semantically grounded Knowledge Graphs from EUR-Lex Directives. The approach integrates three complementary layers: the document structure, the inter- and intra-document legal references, and the ontology-driven semantic layer aligned with legal and energy-domain ontologies. The framework is evaluated on the Energy Efficiency Directive (EU) 2023/1791, resulting in a Knowledge Graph with over 98,000 triples. Experimental results show near-perfect recall in structural and reference extraction and a semantic precision of 78.9\%, assessed through an automated ``LLM-as-a-Judge'' validation. The resulting Knowledge Graph provides a reliable and interpretable foundation for downstream applications such as Graph-based Retrieval-Augmented Generation and Question Answering.| File | Dimensione | Formato | |
|---|---|---|---|
|
2026_03_Riva.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Testo della tesi
Dimensione
925.75 kB
Formato
Adobe PDF
|
925.75 kB | Adobe PDF | Visualizza/Apri |
|
Executive_Summary_Martina_Riva.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: executive summary della tesi
Dimensione
463.76 kB
Formato
Adobe PDF
|
463.76 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/251602