BlastFunction : an FPGA-as-a-service system for accelerated serverless computing

The last decade saw the exponential growth of cloud computing as the primary technology to develop, deploy and maintain complex infrastructures and services at scale. Cloud computing allows to consume resources on-demand and designing web services following a cloud-native approach is fundamental to dynamically scale performance. However, some workloads require computing power that current CPUs are not able to provide and, for this reason, heterogeneous computing is becoming an interesting solution to continue to meet Service Level Agreement (SLA) in the cloud. Field Programmable Gate Arrays (FPGAs) represent one of the possible ways to employ heterogeneous computing in cloud scenarios. Given that requests to services can come at unpredictable rates, the underlying FPGA may not be utilized for 100% of the time. From a cloud provider perspective, sharing would allow to improve time utilization of the FPGA, and the serverless computing paradigm represents a promising approach in this sense, as resources management is delegated to the cloud provider and each functionality is scaled depending on the exact need of the moment. Within this context, we propose that compute-intensive kernels should be accelerated with shared FPGAs handled transparently by the serverless infrastructure: this will maximize utilization while reaching near-native execution latency. In this thesis work we propose BlastFunction, a distributed FPGA sharing system for the acceleration of microservices and serverless applications in cloud environments. BlastFunction provides a transparent and scalable system enabling multi-tenancy in the cloud FPGA scenario, with a vendor-independent and reconfiguration-aware allocation strategy integrated with an existing cloud orchestrator. The system includes a Remote OpenCL Library to access the shared devices transparently and with a known interface; multiple Device Managers which offer the underlying devices using a time-sharing ap- proach and expose relevant metrics; a central Accelerators Registry which tackles the goal of allocating the available devices efficiently using runtime performance metrics, interacting with the Kubernetes orchestrator. We evaluated the system with three experiments to observe first the introduced over- head, then the behaviour in a small cluster with a single scaled function and multiple functions. In all the experiments, BlastFunction was able to reach higher utilization and throughput thanks to the sharing of the device, with minimal differences in latency and requests drop given by the concurrent accesses and the additional I/O latencies.

Gli ultimi anni hanno visto la crescita esponenziale del cloud computing come tecnologia primaria per lo sviluppo, la distribuzione e il mantenimento di infrastrutture complesse e servizi a larga scala. Il Cloud Computing permette il consumo di risorse on-demand, e il design di servizi web seguendo un approccio cloud-native é fondamentale per scalare dinamicamente le performance. Purtroppo, alcuni workload richiedono una potenza di calcolo a cui le CPU non sono in grado di provvedere e, per questo motivo, l’heterogeneous computing sta diventando una possibile soluzione per soddisfare i SLAs. Le FPGA rappresentano una delle possibili tecnologie per l’utilizzo dell’heterogeneous computing in scenari cloud. Dato che le richieste dalla rete esterna possono arrivare con una frequenza imprevedibile, l’FPGA sottostante potrebbe non essere usata per la totalità del tempo. Dalla prospettiva del cloud provider, la condivisione delle risorse permettebbe di migliorare l’utilizzo delle FPGA e in questo senso il paradigma del ser- verless computing é un approccio promettente, poiché la gestione dei server é delegata al cloud provider e ogni funzionalità é scalata on-demand. In questo contesto, proponiamo che i kernel compute-intensive dovrebbero essere accelerati tramite FPGA condivise e gestite in modo trasparente dall’infrastruttura serverless: questo massimizzerà l’utilizzo pur raggiungendo una latenza vicina all’esecuzione nativa. In questo lavoro di tesi proponiamo BlastFunction, un sistema distribuito per la condivisione di FPGA che permette di accelerare microservizi e applicazioni serverless in ambienti cloud. BlastFunction offre un sitema trasparente e scalabile che abilita la multi-tenancy in uno scenario di FPGA cloud, con una strategia di allocazione vendor- independent e reconfiguration-aware integrata con un orchestratore in commercio. Il sistema comprende una Remote OpenCL Library che permette alle applicazioni di acce- dere alla FPGA condivisa in modo trasparente e con un’interfaccia nota; multipli Device Manager che offrono i dispositivi sottostanti con un approccio time-sharing e espongono le relative metriche; un Accelerators Registry centrale che fa fronte all’obiettivo di allocare i dispositivi disponibili efficacemente usando le metriche raccolte a runtime, interagendo con l’orchestratore Kubernetes. Abbiamo valutato il sistema proposto con tre esperimenti per osservare prima l’ove- rhead introdotto, poi il comportamento in un piccolo cluster con una singola applicazione e infine con più applicazioni. In tutti gli esperimenti, BlastFunction é stato in grado di raggiungere un utilizzo maggiore del dispositivo e un maggiore throughput grazie al- la condivisione dell’FPGA, con differenze minime in latenza e numero di richieste non soddisfatte date dall’accesso concorrente e da latenze di I/O aggiuntive.