The rapid growth in the volume, variety, and frequency of geospatial data demands advanced computing techniques for efficient processing and analysis, challenging traditional Geographic Information Systems (GIS) with scalability, cost, and accessibility issues. This thesis presents a robust geospatial data pipeline using Amazon Web Services (AWS) to integrate cloud computing with GIS, leveraging AWS’s scalable infrastructure to overcome these limitations. The pipeline comprises five layers: ingestion (collects data from internal and external sources), storage (manages and preserves data using Amazon S3), cataloging (provides unified data governance with AWS Glue), processing (improves raw data quality through standardization and cleaning using AWS Glue DataBrew), and consumption (allows users to query, analyze, visualize data, and make Machine Learning predictions with Amazon Athena, QuickSight, and SageMaker Canvas). Its effectiveness is demonstrated with environmental data from Milan, Lombardia. The automated architecture processes datasets from the raw zone (unprocessed data) to the trusted zone (cleaned and refined data), enabling comprehensive analysis, visualization, and Machine Learning(ML). This enhances system capacity,performance, and reliability.
La rapida crescita del volume, della varietà e della frequenza dei dati geospaziali richiede tecniche di calcolo avanzate per un’elaborazione e un’analisi efficienti, mettendo alla prova i Sistemi Informativi Geografici (GIS) tradizionali con problemi di scalabilità, costi e accessibilità. Questa tesi presenta una robusta pipeline di dati geospaziali utilizzando Amazon Web Services (AWS) per integrare il cloud computing con GIS, sfruttando l’infrastruttura scalabile di AWS per superare queste limitazioni. La pipeline comprende cinque livelli: ingestione (raccoglie dati da fonti interne ed esterne), archiviazione (gestisce e preserva i dati utilizzando Amazon S3), catalogazione (fornisce una governance unificata dei dati con AWS Glue), elaborazione (migliora la qualità dei dati grezzi attraverso la standardizzazione e la pulizia utilizzando AWS Glue DataBrew), e consumo (consente agli utenti di interrogare, analizzare, visualizzare i dati e fare previsioni di Machine Learning con Amazon Athena, QuickSight e SageMaker Canvas). L’efficacia della pipeline è dimostrata con dati ambientali di Milano, Lombardia. L’architettura automatizzata elabora i dataset dalla zona raw (dati non elaborati) alla zona trusted (dati puliti e raffinati), consentendo un’analisi, visualizzazione e Machine Learning completi. Questo migliora la capacità del sistema, le prestazioni e l’affidabilità.
Synergizing geospatial data and cloud computing: establishing an AWS pipeline
YMERI, ERNESA
2023/2024
Abstract
The rapid growth in the volume, variety, and frequency of geospatial data demands advanced computing techniques for efficient processing and analysis, challenging traditional Geographic Information Systems (GIS) with scalability, cost, and accessibility issues. This thesis presents a robust geospatial data pipeline using Amazon Web Services (AWS) to integrate cloud computing with GIS, leveraging AWS’s scalable infrastructure to overcome these limitations. The pipeline comprises five layers: ingestion (collects data from internal and external sources), storage (manages and preserves data using Amazon S3), cataloging (provides unified data governance with AWS Glue), processing (improves raw data quality through standardization and cleaning using AWS Glue DataBrew), and consumption (allows users to query, analyze, visualize data, and make Machine Learning predictions with Amazon Athena, QuickSight, and SageMaker Canvas). Its effectiveness is demonstrated with environmental data from Milan, Lombardia. The automated architecture processes datasets from the raw zone (unprocessed data) to the trusted zone (cleaned and refined data), enabling comprehensive analysis, visualization, and Machine Learning(ML). This enhances system capacity,performance, and reliability.File | Dimensione | Formato | |
---|---|---|---|
Thesis_ErnesaYmeri.pdf
non accessibile
Descrizione: Ernesa Ymeri- Master Thesis
Dimensione
3.79 MB
Formato
Adobe PDF
|
3.79 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/222154