Improving relational database replication with GlusterFS in fog environments

Fog computing is a recent paradigm that aims to create a continuum between the Cloud and Edge computing. Instead of assuming that the Edge is only in charge of gathering data from local devices and sensors, while the Cloud is in charge of efficiently computing that data in a scalable way, in a Fog environment data and computation can be moved from the Cloud to the Edge and vice-versa, in order to make the data offering to the final user more efficient. The need to move data closer to computation is particularly justified when dealing with data-intensive applications. A data-intensive application needs to work on large amounts of data. The performance of this type of application would be greatly affected by latency-issues, if it had to access data remotely. This is why making data locally available to the application is fundamental. Furthermore, it is also important to consider that the same data set can be accessed by several users, thus the data movement must be coupled with a proper replication approach. The goal of this thesis is to propose a solution to provide data replication in a dynamic distributed system, so as to make data locally available to several geographically-distributed entities. In order to achieve this goal, several tools are used, such as Kubernetes and Glusterfs. In particular, we assume that data are stored in relational Data Base Management Systems (e.g., MySQL), and the proposed approach is able to overcome some of the limitations of using this type of systems, in terms of data locking required before providing replicas. Four possible solutions are presented by differently adopting the replication features of- fered by MySQL and Glusterfs. The thesis offers a comparison of these methods in terms of functionalities and the time required to setup a consistent replica of data.

Il Fog computing è un paradigma recente che ha come obiettivo quello di creare un continuum tra Cloud computing ed Edge computing. Invece di supporre che l'Edge abbia la sola funzione di raccogliere i dati dai dispositivi locali e i sensori, mentre il Cloud abbia la sola funzione di computare i dati, in un ambiente Fog i dati e la computazione possono essere spostati dal Cloud all'Edge e vice-versa, per offrire i dati all'utente finale in maniera più efficiente. La necessità di spostare i dati più vicino alla computazione è particolarmente giustificata quando si ha a che fare con applicazioni di tipo data-intensive. Queste applicazioni hanno bisogno di lavorare su grandi quantità di dati. Dunque la loro performance verrebbe particolarmente condizionata da problemi di latenza, se dovessero accedere ai dati remotamente. Questo è il motivo per cui è fondamentale rendere i dati disponibili localmente all'applicazione. Inoltre, è importante considerare che gli stessi dati possono essere acceduti da molteplici user, dunque lo spostamento dei dati va effettuato insieme ad un appropriato metodo di replicazione. L'obiettivo di questa tesi è quello di proporre una soluzione per effettuare la replicazione dei dati in un sistema distribuito dinamico, così da rendere i dati disponibili localmente a molteplici entità geograficamente distribuite. Per raggiungere questo obiettivo, diversi tool sono usati, come Kubernetes e Glusterfs. In particolare, assumiamo che i dati siano immagazzinati in Data Base Management Systems di tipo relazionale (come MySQL), e che l'approccio proposto sia in grado di superare le limitazioni imposte da questo tipo di sistemi, per quanto riguarda la necessità di mettere un lock sui dati prima di replicarli. Quattro possibili soluzioni sono presentate, adottando diverse tecniche di replicazione offerte da MySQL e Glusterfs. Questa tesi offre un confronto tra questi metodi, per quanto riguarda le funzionalità e il tempo richiesto per fornire una replica dei dati.