In recent years, we are witnessing an uncontrollable growth of data availability. In this context, Big Data is a term used to indicate data-related challenges. The first one is volume, which is related to the great quantity of data. Solutions that address this challenge typically present a distribute architecture, in order to maintain efficiency under heavy loads. The second is variety, which is related to the presence of multiple data formats. To face this problem, different systems perform graph-based computation, in order to recollect all data under a unified model. The third and last one is velocity, which is related to the high rates at which data arrive. Solutions that tackle this problem are able to perform fast and reactive computation. In particular, Stream Processing Systems are able to timely process data as soon as they arrive performing a limited number of memory-related operations. They come in the form of Data Stream Management Systems (DSMS) and Complex Event Processing (CEP) systems. Big Data Stream Processing Engines (BigSPEs) are stream processing systems able to also address either variety or volume. However, building a solution capable of addressing all three challenges is not an easy task. In this thesis work, we try to provide a solution to this problem, building an eXtream Processing (XP) system. We focus on investigating the following research question: is it possible to perform XP starting from BigSPEs? Due to its broad scope, this question is hard to answer. Thus, we narrow its scope eliciting a set of requirements. The requirements lead us to a more precise research question, i.e., is it possible to perform CEP and DSMS operations starting from S2PE? S2PEs are stream processing systems with DSMS functionalities that present a distributed architecture. However, they lack expressivity, which we can increase by adding CEP functionalities. From the ensuing investigation, we identify the Event Processing Language (EPL) as the language able to integrate CEP and DSMS. On the other hand, Kafka Streams is the system that provides a good extendability together with a sound, underlining model. Thus, we narrow the investigation even further focusing on how can we port EPL onto Kafka Streams? In practice, this thesis investigates two design problems: (P1) providing a formal semantics for EPL as it is currently only implemented on Esper and OracleCEP, and (P2) extending Kafka Streams and the operational model behind it to include CEP functionalities. To solve P1, we introduce ElLiPsis, an EPL formalization covering both DSMS and CEP operators. To solve P2, we provide an extension of the Kafka Streams’ model, called KEPLr’s Model, introducing the concept of type. Additionally, we present a proof-of-concept implementation based on the extended model, together with some possible future extensions, including a reinterpretation of the Kafka Streams’ model.
Negli ultimi anni stiamo assistendo a una crescita incontrollabile della disponibilità dei dati. In questo contesto, Big Data è un termine usato per indicare una serie di problemi relativi ai dati. Il primo è il volume, relativo alla grande quantità di dati. Le soluzioni che gestiscono questo aspetto presentano in genere un’architettura distribuita, al fine di mantenere efficienza sotto grandi carichi di dati. Il secondo è la varietà, legata alla presenza di numerosi formati del dato. Per affrontare questo problema, diversi sistemi adottano un modello basato sui grafi, al fine di ricondurre i dati sotto un unico modello. Il terzo e ultimo problema è la velocità, che è relativa alle alte frequenze di arrivo dei dati. Le soluzioni che affrontano questo problema sono in grado di computare risultati molto velocemente. In particolare, i sistemi di Stream Processing elaborano tempestivamente i dati riducendo al minimo lo storage. Questi sistemi si presentano sotto forma di Data Stream Management Systems (DSMS) o sistemi di Complex Event Processing (CEP). I Big Data Stream Processing Engines (BigSPE) sono sistemi di Stream Processing in grado di gestire anche varietà o volume. Tuttavia, una soluzione capace di affrontare i tre problemi non è così immediata. In questo lavoro di tesi, cerchiamo di fornire una soluzione a tutti e tre i problemi: un sistema di eXtream Processing (XP). Ci concentriamo sulla seguente domanda di ricerca: è possibile eseguire XP a partire da BigSPE? Data la sua ampia portata, è difficile rispondere. Pertanto, la semplifichiamo esplicitando una serie di requisiti che ci portano a una domanda di ricerca più precisa, ovverosia è possibile eseguire operazioni di CEP e DSMS a partire da S2PE? Gli S2PE sono sistemi di stream processing con funzionalità di DSMS che presentano un’architettura distribuita. Tuttavia, mancano di espressività, che possiamo aumentare aggiungendo funzionalità di CEP. Dalla successiva indagine, identifichiamo l’Event Processing Language (EPL) come il linguaggio in grado di integrare CEP e DSMS. Kafka Streams è il sistema che offre estensibilità insieme a un solido modello di base. Pertanto, restringiamo ulteriormente l’ indagine concentrandoci su come possiamo trasferire EPL su Kafka Streams? In pratica, questa tesi indaga due problemi di design: (P1) fornire una formalizzazione di EPL in quanto è implementato solo su Esper e OracleCEP, e (P2) estendere Kafka Streams e il suo modello operativo per includere funzionalità CEP. Per risolvere P1, presentiamo ElLiPsis, una formalizzazione di EPL che copre sia gli operatori DSMS che CEP. Per risolvere P2, forniamo un’estensione del modello Kafka Streams, chiamato KEPLr’s Model, introducendo il concetto di tipo. Inoltre, presentiamo un’implementazione di proof-of-concept basata sul modello esteso, insieme ad alcune possibili estensioni future, inclusa una reinterpretazione del modello di Kafka Streams.
Towards extream processing with KEPLr
LANGHI, SAMUELE
2018/2019
Abstract
In recent years, we are witnessing an uncontrollable growth of data availability. In this context, Big Data is a term used to indicate data-related challenges. The first one is volume, which is related to the great quantity of data. Solutions that address this challenge typically present a distribute architecture, in order to maintain efficiency under heavy loads. The second is variety, which is related to the presence of multiple data formats. To face this problem, different systems perform graph-based computation, in order to recollect all data under a unified model. The third and last one is velocity, which is related to the high rates at which data arrive. Solutions that tackle this problem are able to perform fast and reactive computation. In particular, Stream Processing Systems are able to timely process data as soon as they arrive performing a limited number of memory-related operations. They come in the form of Data Stream Management Systems (DSMS) and Complex Event Processing (CEP) systems. Big Data Stream Processing Engines (BigSPEs) are stream processing systems able to also address either variety or volume. However, building a solution capable of addressing all three challenges is not an easy task. In this thesis work, we try to provide a solution to this problem, building an eXtream Processing (XP) system. We focus on investigating the following research question: is it possible to perform XP starting from BigSPEs? Due to its broad scope, this question is hard to answer. Thus, we narrow its scope eliciting a set of requirements. The requirements lead us to a more precise research question, i.e., is it possible to perform CEP and DSMS operations starting from S2PE? S2PEs are stream processing systems with DSMS functionalities that present a distributed architecture. However, they lack expressivity, which we can increase by adding CEP functionalities. From the ensuing investigation, we identify the Event Processing Language (EPL) as the language able to integrate CEP and DSMS. On the other hand, Kafka Streams is the system that provides a good extendability together with a sound, underlining model. Thus, we narrow the investigation even further focusing on how can we port EPL onto Kafka Streams? In practice, this thesis investigates two design problems: (P1) providing a formal semantics for EPL as it is currently only implemented on Esper and OracleCEP, and (P2) extending Kafka Streams and the operational model behind it to include CEP functionalities. To solve P1, we introduce ElLiPsis, an EPL formalization covering both DSMS and CEP operators. To solve P2, we provide an extension of the Kafka Streams’ model, called KEPLr’s Model, introducing the concept of type. Additionally, we present a proof-of-concept implementation based on the extended model, together with some possible future extensions, including a reinterpretation of the Kafka Streams’ model.File | Dimensione | Formato | |
---|---|---|---|
tesi.pdf
accessibile in internet per tutti
Descrizione: Testo della Tesi KEPLr
Dimensione
9.35 MB
Formato
Adobe PDF
|
9.35 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/153935