An FPGA toolchain for Graph Neural Network acceleration using high-level synthesis

Graph Neural Networks are a class of Machine Learning models that have emerged as an efficient approach to dealing with graph-structured data, encompassing domains ranging from social networks to molecular chemistry and more. Particularly in the contemporary era of Big Data, dedicated hardware accelerators are often required to achieve optimal computational performance when processing large amounts of information. FPGAs offer a promising solution due to their inherent parallelism, but the process of translating Graph Neural Network models into FPGA accelerators is complex and requires extensive knowledge and expertise. This thesis addresses the challenge of accelerating Graph Neural Networks on FPGA, introducing a comprehensive toolchain that simplifies the process of transitioning from the PyTorch high-level framework to synthesized hardware accelerators leveraging High-Level Synthesis and the MLIR compiler infrastructure. Torch-MLIR is employed to produce the MLIR representation of the GNN model, which serves as input for the synthesizer. Here, fine-tuned optimizations can be applied before generating the ultimate GNN accelerator, ready to enhance inference performance on FPGA architectures. Experimental results demonstrate the efficacy of the toolchain, confirming substantial improvements in both performance and resource utilization. This accomplishment became possible through the identification of model bottlenecks and a study on optimizing matrix multiplication operations, which resulted to be a critical component of GNN computations. In conclusion, this thesis represents a significant advancement in the domain of FPGA-accelerated GNN models. By developing an accessible and versatile toolchain and exploring synthesis optimizations, the research sets the stage for more efficient and widely accessible FPGA-accelerated GNN implementations.

Le Graph Neural Networks sono una classe di modelli di Machine Learning e rappresentano l’approccio predefinito nel trattamento dei dati strutturati a grafo, comprendendo domini che vanno dai social networks alla chimica molecolare e oltre. Particolarmente nell’era contemporanea dei Big Data, sono spesso necessari acceleratori hardware dedicati al fine di ottenere prestazioni di calcolo ottimali durante l’elaborazione di grandi quantità di informazioni. Le FPGA offrono una soluzione promettente grazie al loro parallelismo intrinseco, ma il processo di traduzione dei modelli di Graph Neural Networks in acceleratori su FPGA è complesso e richiede una vasta conoscenza ed esperienza. Questa tesi affronta la sfida dell’accelerazione delle Graph Neural Networks su FPGA, introducendo una toolchain completa che semplifica il processo di transizione dal framework di alto livello PyTorch agli acceleratori hardware sintetizzati mediante High-Level Synthesis sfruttando l’infrastruttura del compilatore MLIR. Torch-MLIR consente la generazione della rappresentazione MLIR del modello, utilizzata come input per il sintetizzatore. In quest’ultima fase è possibile applicare diverse ottimizzazioni prima della generazione dell’acceleratore, migliorando in modo mirato le prestazioni di inferenza sulle architetture FPGA. I risultati sperimentali dimostrano l’efficacia della toolchain, confermando miglioramenti sostanziali sia nelle prestazioni che nell’utilizzo delle risorse. Questo risultato è stato possibile grazie all’individuazione dei colli di bottiglia del modello e allo studio dell’ottimizzazione delle operazioni di moltiplicazione matriciale, che si sono rivelate una componente fondamentale delle computazioni delle GNNs. In conclusione, questa tesi rappresenta un progresso significativo nel campo dei modelli GNN accelerati su FPGA. Sviluppando una toolchain versatile ed esplorando le ottimizzazioni di sintesi, la ricerca pone le basi per implementazioni di GNN accelerate su FPGA più efficienti.