TiReX : tiled regular eXpressions matching architecture

In the last few years, with the advancement of the technology and the production of systems provided with a great amount of computational power, the world of informatics has begun to face what we nowadays call Big Data. Every day huge amounts of data are produced, and, since these data carry relevant information, there is the need to analyze them in efficient ways and, most importantly, in reasonable amounts of time. Among the various approaches to extract information and patterns from data, Regular Expressions(REs) are used to find user-defined patterns from a large variety of data sources and for many different purposes. REs can be applied in many different fields, ranging from simple find and replace functionality in text editors to database querying, from DNA analysis to Deep Packet Inspection to provide protection to IT-systems. Despite their diversity, these scenarios have similar performance challenges. For example, the DNA contains up to 3 billion characters, while the packets travel at more than 10 Gb/s through the network. Therefore, there is the necessity to have a system able to recognize patterns in real time, and pure-software solutions are often unable to meet such requirements. On the other hand, the hardware-based solutions proposed so far, which typically embed the patterns in the circuit logic, are not adequate for several scenarios where REs are employed. Therefore, we propose an architecture based on a matching core, where REs are software-compiled into instructions and run against input data. The instructions can be easily updated to change the RE that has to be analyzed in a very flexible way. The architecture is implemented on an FPGA device, able to accelerate the whole matching process. We produce a multi-core system which can proportionally increase the performance, since the number of the matching cores can easily scale up with the available resources. We evaluate the proposed architecture by comparing its performance against the best performing software solution.

Negli ultimi anni, con l'avanzamento tecnologico e con la produzione di sistemi dotati di una grande potenza computazionale, il mondo dell'informatica ha incominciato a trovarsi di fronte a quella che oggigiorno è chiamata epoca dei Big Data. Ogni giorno, viene prodotta un'enorme mole di dati e vi è un crescente bisogno di analizzarli in maniere sempre più efficienti e in tempi ragionevoli. Sono disponibili diversi approcci per l'estrazione di informazioni dai dati, e uno di questi sfrutta le Espressioni Regolari (RE), usate per trovare pattern definiti dall'utente tra svariati tipi di dati. Le RE possono essere applicate in diversi campi, che vanno da semplici funzionalità di ricerca e sostituzione negli editor di testo fino alle queries alle basi di dati, dall'analisi del DNA al Packet Inspection per fornire protezione ai sistemi IT. Nonostante la diversità, questi scenari pongono sfide molto simili per quanto riguarda le performance. Il DNA contiene fino a 3 miliardi di caratteri, mentre i pacchetti di rete viaggiano a più di 10Gb/s. Pertanto, vi è la necessità di avere un sistema in grado di riconoscere i pattern in tempo reale, e le RE non sono in grado di affrontare tali requisiti tramite soluzioni puramente software. D'altra parte, le soluzioni hardware proposte fino ad ora, per le quali i pattern sono inseriti direttamente nella logica circuitale, non sono fattibili in alcuni scenari dove le RE sono impiegate. Ecco perchè abbiamo deciso di progettare un'architettura basata su un processore personalizzato che effettua il pattern matching e in cui le RE sono compilate via software in istruzioni da eseguire sui dati. Le istruzioni possono essere aggiornate facilmente così da cambiare l'RE che deve essere analizzata in modo molto flessibile. La soluzione è stata implementata su FPGA per accelerare l'intero processo di riconoscimento delle RE. Inoltre, la nostra soluzione prevede un sistema multi-core che può incrementare considerevolmente le performance. Abbiamo validato l'architettura attraverso una comparazione in termini di performance con la soluzione software più performante.