High performance stream reasoning : a highly parallel implementation of C-SPARQL in CUDA

Stream Reasoning (SR) is a growing field that combines the expressivity of Semantic Web with the functionality of Data Stream Management Systems. Applications in this area may require fast-response and time-critical processing, leading to the need for high performance systems to fulfill their requirements. In addition, the generation of semantic data is growing in recent years, due to the creation of Internet applications characterized by an high information production rate (e.g. social media) and to the expansion of sensors utilization, that benefit from the use of semantic annotations. As that, more performing and scalable query engines are necessary. In this field of application, C-SPARQL is a query language that extends standard SPARQL with streaming semantic, allowing stream reasoning. Currently C-SPARQL engines are based on sequential programming, built on SPARQL endpoints coupled with specialised tools to deal with the streaming semantic. The aim of this work is to verify if a parallel implementation of C-SPARQL could achieve better performance than already existent engines. We propose CUrdf, a C-SPARQL engine built on an highly parallel architecture, the Graphics Processing Unit (GPU), using CUDA. General-Purpose computing on GPU (GPGPU) has becoming a leading trend for developing high performance systems and processing large data. CUDA is an application programming interface developed by NVIDIA, that permits GPGPU on CUDA enabled devices. The model proposed is based on the translation of C-SPARQL queries to relational algebra and the implementation of relational operators on GPU. As streaming processing requires a dynamic flow of incoming and outgoing data, we used an incremental approach for the creation and update of the database. In addition, to facilitate the implementation, we introduced an hashing procedure of the incoming triples and an hash map retrieval process for the outgoing ones. Curdf is compared, on static and streaming scenarios, with state-of-the-art SPARQL engines, RDFox and RDF4J, extended to take into account the streaming paradigm. The results obtained show that, for streaming scenarios, CUrdf is much faster than SPARQL engines, while, for static ones, the execution time of both engines results to be query dependant. For static processing, in fact, CUrdf showed to have an "activation time", as it needs to pay a fixed overhead to invoke each and every of the operators that compose the query. This overhead degrades CUrdf performance and gives results worse than SPARQL engines. However, as soon as each operator generates enough intermediate results, CUrdf outperforms them by exploiting the computational capabilities of the GPU. In general, we can say that the number of elements generated by the query, both for output and intermediate results, determines which of the two implementations will perform better for static query processing. For streaming, instead, the adoption of a simple and incremental data structure to store windows allows CUrdf to obtain better performances on window execution.

Stream reasoning(SR) è un settore in crescita che unisce l'espressività del Web Semantico con le funzionalità dei Data Stream Management Systems. Le applicazioni in questo settore possono richiedere tempi rapidi di esecuzione ed elaborazione real-time dei risultati, creando la necessità di sistemi ad alte prestazioni per soddisfarne i requisiti. Inoltre, la generazione di dati semantici sta crescendo negli ultimi anni, a causa dello sviluppo di applicazioni Internet caratterizzate da un'elevata produzione di informazioni (ad esempio i social media) e dell'espansione dell'uso di sensori, che possono beneficiare di annotazioni semantiche. Da ciò, deriva la necessità di avere motori di query scalabili e performanti. In questo campo di applicazione, C-SPARQL è un linguaggio di query che estende il linguaggio standard SPARQL con la semantica streaming, permettendo lo stream reasoning. Attualmente, i motori C-SPARQL sono basati sulla programmazione sequenziale, utilizzando punti di accesso SPARQL accoppiati con strumenti specializzati per gestire la semantica streaming. Lo scopo di questo lavoro è verificare se un'implementazione parallela di C-SPARQL possa ottenere prestazioni migliori dei motori già esistenti. Proponiamo CUrdf, un motore C-SPARQL implementato su un'architettura altamente parallela, l'unità di elaborazione grafica (GPU), usando CUDA. Il calcolo generico su GPU (GPGPU) è diventato un elemento importante per lo sviluppo di sistemi ad alte prestazioni e per l'elaborazione di grandi quantità di dati. CUDA è un'interfaccia di programmazione sviluppato da NVIDIA, che permette GPGPU su hardware abilitato per CUDA. Il modello proposto si basa sulla traduzione dalle queries C-SPARQL all'algebra relazionale e l'implementazione degli operatori relazionali su GPU. Poichè l'esecuzione streaming richiede un flusso dinamico di dati in entrata e in uscita, abbiamo utilizzato un approccio incrementale per la creazione e l'aggiornamento del database. Inoltre, per rendere più semplice l'implementazione, abbiamo introdotto una procedura di hash delle triple in entrata e un processo di ricerca nella mappa di hash per quelle in uscita. Curdf viene confrontato su entrambi gli scenari, statici e streaming, con dei motori SPARQL standard, RDFox e RDF4J, estesi al fine di gestire il paradigma di streaming. I risultati ottenuti mostrano che, per lo streaming, CUrdf risulta essere molto più veloce rispetto ai motori SPARQL standard, mentre, per gli scenari statici, il tempo di esecuzione di entrambi i motori risulta essere dipendente dalla query proposta. Per quanto riguarda l'esecuzione statica, infatti, CUrdf ha dimostrato di avere un "tempo di attivazione", poichè necessita di pagare un tempo aggiuntivo fisso per invocare ciascuno degli operatori che compongono la query. Questo tempo degrada le prestazioni e fornisce risultati peggiori dei motori SPARQL. Al contrario, appena ogni operatore genera abbastanza risultati intermedi, CUrdf risulta più veloce, avvantaggiandosi delle capacità computazionali della GPU. In generale, si può dire che il numero di elementi generati dalla query, sia per quanto riguarda l'output che i risultati intermedi, determina quale delle due implementazioni funzioni meglio per l'esecuzione di query statiche. Per lo streaming, invece, l'utilizzo di una struttura dati semplice ed incrementale per immagazzinare le finestre permette a CUrdf di ottenere migliori prestazioni sull'esecuzione della finestra stessa.