In the modern healthcare landscape, advancements in technology have resulted in a significant increase in data generation and usage across various applications. This data includes an individual's medical history, personal information, results from imaging and laboratory tests, data from genomics-driven experiments, and data generated from monitoring devices. This vast amount of data comes from different sources and exists in various formats, creating a pressing need for efficient management and storage systems. The data lake is one such system that can store and integrate different types of data at any scale. However, it is very important to ensure that the data is well-organized, and also, all the privacy and security standards should be followed so that any personal information is not disclosed. To overcome the challenge of managing vast amounts of data in the healthcare industry, a solution lies in utilizing a data catalog that incorporates metadata to provide additional information about the data. However, due to the unique nature of each domain (e.g., healthcare, oceanography, botany, etc.), it is essential to create a metadata model that is customized to the specific domain, taking into account the data's structure and purpose. With this in mind, this thesis aims to identify metadata models that can be used effectively for all applications in the healthcare domain. As a suitable model was not found in the existing literature, we had to develop it from scratch. The metadata model was then validated using the Apache Atlas platform through a demo implementation. This thesis marks a significant step towards creating a comprehensive metadata model and data lake architecture that is ideal for healthcare applications, leading to efficient healthcare data interoperability among different research institutes in Italy.
Nel moderno panorama sanitario, i progressi della tecnologia hanno portato ad un significativo aumento della generazione e utilizzo di dati, attraverso varie applicazioni. Questi dati includono le informazioni personali, la storia medica, i risultati di esami di laboratorio, le immagini e i dati generati da dispositivi di monitoraggio (anche remoto). Questa vasta quantità di dati proviene da diverse sorgenti può esistere in molteplici formati, creando la necessità di avere sistemi di gestione e archiviazione efficienti. Il data lake, grazie alla sua capacità che di immagazzinare dati in diversi formati e a qualsiasi scala, è uno di questi strumenti. Tuttavia, è molto importante assicurarsi che i dati siano ben organizzati e che si seguano tutti gli standard di privacy e sicurezza, in modo che non vengano rivelate informazioni personali. Un catalogo dati che utilizza i metadati per fornire informazioni aggiuntive sui dati può contribuire a risolvere questo problema. Poiché ogni applicazione sanitaria ha diverse peculiarità, è fondamentale sviluppare un modello di metadati personalizzato specifico per l’applicazione specifica, considerando la struttura e lo scopo dei dati oggetto di interesse. Questa tesi mira a identificare tali modelli di metadati per l’utilizzo nel campo sanitario. Poiché un simile modello non è stato trovato nella letteratura esistente, ne è stato sviluppato uno da zero. Il modello di metadati è stato poi validato attraverso una demo utilizzando la piattaforma Apache Atlas. Questa tesi segna un significativo passo avanti verso la creazione di un modello di metadati completo e un’architettura di data lake ideale per il contesto delle applicazioni sanitarie, in grado anche di garantire un’efficiente interoperabilità tra diverse istituzioni di ricerca in Italia.
A minimum metadata model for healthcare data interoperability
KUMAR, PRIYANSH
2022/2023
Abstract
In the modern healthcare landscape, advancements in technology have resulted in a significant increase in data generation and usage across various applications. This data includes an individual's medical history, personal information, results from imaging and laboratory tests, data from genomics-driven experiments, and data generated from monitoring devices. This vast amount of data comes from different sources and exists in various formats, creating a pressing need for efficient management and storage systems. The data lake is one such system that can store and integrate different types of data at any scale. However, it is very important to ensure that the data is well-organized, and also, all the privacy and security standards should be followed so that any personal information is not disclosed. To overcome the challenge of managing vast amounts of data in the healthcare industry, a solution lies in utilizing a data catalog that incorporates metadata to provide additional information about the data. However, due to the unique nature of each domain (e.g., healthcare, oceanography, botany, etc.), it is essential to create a metadata model that is customized to the specific domain, taking into account the data's structure and purpose. With this in mind, this thesis aims to identify metadata models that can be used effectively for all applications in the healthcare domain. As a suitable model was not found in the existing literature, we had to develop it from scratch. The metadata model was then validated using the Apache Atlas platform through a demo implementation. This thesis marks a significant step towards creating a comprehensive metadata model and data lake architecture that is ideal for healthcare applications, leading to efficient healthcare data interoperability among different research institutes in Italy.File | Dimensione | Formato | |
---|---|---|---|
2023_05_KUMAR.pdf
accessibile in internet per tutti
Descrizione: A Minimum Metadata Model for Health Data Interoperability
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/204642