Effective cost estimation for tendering plays a critical role in the building construction process, enabling efficient investment management and ensuring successful execution of the construction phase. Traditional cost estimation procedure requires practitioners to manually extract information from documents written in natural language, with the purpose of relating cost information to object data. This activity requires practitioner deep experience and manual effort, often resulting in errors and, in the worst scenario, in judicial disputes. Several studies are currently on going with the aim of supporting practitioners activity and making the process more efficient. A correct cost estimation analysis allows to predict cost, time and other resources for effectively planning the investments necessary for the building work realization, but also for successfully managing the construction phase. This research activity aims to investigate a framework for validating cost information within textual document in order to prevent information non homogeneity. When inconsistencies between the cost information contained in project documents are found, misunderstandings occur, leading as a consequence to a lack of efficiency in the process with waste of resources and, in the worst cases, bringing as a further consequence to judicial disputes. The framework developed involves the definition of four modules for reaching the goal set. The first module consists of the development of a cost domain ontology and semantic as the foundation layer for the use of NLP techniques. The second module consists of defining an automated approach for extracting the data according to the previously define cost domain ontology from textual documents. The third consists in retrieving information from BIM models. Finally, the last two modules consists of comparing the data extracted from different textual documents and verifying the consistency. The results of this research include the development of a cost domain ontology, which was tested to structure a large dataset derived from the Lombardy Region Price List and successfully scaled to broader datasets. Another key contribution is the implementation of an innovative approach for applying GPT LLM to data structuring in the construction sector according to the structured ontology, which in that way was also validated. Finally, a representative case study was used to validate the last modules and to confirm that the framework is capable of detecting inconsistencies related to the cost domain across the design documents. The classification component was validated using the Italian Price List documents as a case study, while the overall framework was tested on a representative dataset, demonstrating its effectiveness in identifying inconsistencies across multiple textual sources.
Una stima dei costi accurata in fase di gara d'appalto riveste un ruolo fondamentale nel processo edilizio, in quanto consente una gestione efficiente degli investimenti e garantisce il successo dell’esecuzione della fase costruttiva. Le procedure tradizionali di stima richiedono ai professionisti di estrarre manualmente le informazioni dai documenti scritti in linguaggio naturale, con l’obiettivo di mettere in relazione i dati di costo con i dati degli oggetti. Questa attività richiede una profonda esperienza e un notevole sforzo manuale da parte del professionista, spesso generando errori e, nei casi peggiori, controversie legali. Numerosi studi sono attualmente in corso con l’obiettivo di supportare l’attività dei professionisti e rendere il processo più efficiente. Una corretta analisi di stima dei costi permette infatti di prevedere costi, tempi e altre risorse necessarie, non solo per pianificare efficacemente gli investimenti indispensabili alla realizzazione dell’opera edilizia, ma anche per gestire con successo la fase esecutiva del progetto. Questa attività di ricerca si propone di indagare un framework per la validazione delle informazioni di costo all’interno dei documenti testuali, al fine di prevenire disomogeneità informative. Quando si riscontrano incoerenze tra le informazioni di costo contenute nei documenti progettuali, possono verificarsi incomprensioni che determinano una perdita di efficienza, spreco di risorse e, nei casi peggiori, l’insorgere di dispute legali. Il framework sviluppato si articola in quattro moduli finalizzati al raggiungimento dell’obiettivo. Il primo modulo riguarda lo sviluppo di un’ontologia del dominio dei costi e della relativa struttura semantica come livello di base per l’utilizzo di tecniche di NLP. Il secondo modulo consiste nella definizione di un approccio automatizzato per l’estrazione dei dati dai documenti testuali, in conformità con l’ontologia dei costi precedentemente definita. Il terzo modulo riguarda il recupero delle informazioni dai modelli BIM. Infine, gli ultimi due moduli si occupano di confrontare i dati estratti da diversi documenti testuali e verificarne la coerenza. I risultati di questa ricerca comprendono lo sviluppo di un’ontologia del dominio dei costi, testata per strutturare un ampio dataset derivato dal Prezzario della Regione Lombardia e successivamente applicata con successo a dataset più ampi. Un ulteriore contributo chiave è l’implementazione di un approccio innovativo che applica i GPT LLM alla strutturazione dei dati nel settore delle costruzioni secondo l’ontologia definita, contribuendo così anche alla validazione dell’ontologia stessa. Infine, un caso studio rappresentativo è stato utilizzato per validare gli ultimi moduli e confermare che il framework è in grado di rilevare incoerenze relative al dominio dei costi nei documenti di progetto. Il componente di classificazione è stato validato utilizzando i documenti del Prezzario italiano come caso studio, mentre l’intero framework è stato testato su un dataset rappresentativo, dimostrando la sua efficacia nell’identificazione di incoerenze tra più fonti testuali.
Framework for data validation within tendering documents developing a cost domain ontology
Gatto, Chiara
2024/2025
Abstract
Effective cost estimation for tendering plays a critical role in the building construction process, enabling efficient investment management and ensuring successful execution of the construction phase. Traditional cost estimation procedure requires practitioners to manually extract information from documents written in natural language, with the purpose of relating cost information to object data. This activity requires practitioner deep experience and manual effort, often resulting in errors and, in the worst scenario, in judicial disputes. Several studies are currently on going with the aim of supporting practitioners activity and making the process more efficient. A correct cost estimation analysis allows to predict cost, time and other resources for effectively planning the investments necessary for the building work realization, but also for successfully managing the construction phase. This research activity aims to investigate a framework for validating cost information within textual document in order to prevent information non homogeneity. When inconsistencies between the cost information contained in project documents are found, misunderstandings occur, leading as a consequence to a lack of efficiency in the process with waste of resources and, in the worst cases, bringing as a further consequence to judicial disputes. The framework developed involves the definition of four modules for reaching the goal set. The first module consists of the development of a cost domain ontology and semantic as the foundation layer for the use of NLP techniques. The second module consists of defining an automated approach for extracting the data according to the previously define cost domain ontology from textual documents. The third consists in retrieving information from BIM models. Finally, the last two modules consists of comparing the data extracted from different textual documents and verifying the consistency. The results of this research include the development of a cost domain ontology, which was tested to structure a large dataset derived from the Lombardy Region Price List and successfully scaled to broader datasets. Another key contribution is the implementation of an innovative approach for applying GPT LLM to data structuring in the construction sector according to the structured ontology, which in that way was also validated. Finally, a representative case study was used to validate the last modules and to confirm that the framework is capable of detecting inconsistencies related to the cost domain across the design documents. The classification component was validated using the Italian Price List documents as a case study, while the overall framework was tested on a representative dataset, demonstrating its effectiveness in identifying inconsistencies across multiple textual sources.| File | Dimensione | Formato | |
|---|---|---|---|
|
Framework for data validation within tendering documents developing a cost domain ontology.pdf
accessibile in internet per tutti
Dimensione
19.32 MB
Formato
Adobe PDF
|
19.32 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/244198