Comparing hls4ml and Vitis AI for CNN synthesis and evaluation on FPGA : a comprehensive study

Convolutional Neural Networks (CNNs) are a type of Artificial Neural Network inspired by how our brains process information. CNNs are currently the best method for computer vision, speech, and natural language processing. These models have been traditionally executed on CPUs and GPUs, which have certain limitations (e.g. significant amount of power consumed and limited flexibility) that can limit their performance and efficiency. As a result, specialized hardware, such as FPGAs and ASICs are increasingly being used to provide high-performance and energy-efficient solutions for executing CNNs in a variety of applications. In particular, FPGAs appear as promising platform for accelerating CNNs hardware-wise, because they have high speed and low power consumption. In this master thesis, a powerful framework (hls4ml) it's used to synthesize CNNs to execute models on the FPGA Alveo U280. This type of FPGA is a powerful accelerator card developed by Xilinx and designed to provide high-performance, low-latency processing for compute-intensive workloads in data centers and cloud computing environments. The entire work consists of an in-depth analysis of the models synthesized by comparing them with another tool (Vitis AI). The thesis aims to show the power of these tools by showing when it is better to use one rather than the other. This work could become fundamental for anyone who wants to use these tools to synthesize neural networks on FPGAs.

Le reti neurali convoluzionali (CNN) sono un tipo di rete neurale artificiale ispirate al modo in cui il nostro cervello elabora le informazioni. Le CNN sono attualmente il migliore metodo per la visione artificiale e l'elaborazione del linguaggio naturale. Questi modelli sono stati tradizionalmente eseguiti su CPU e GPU, che hanno alcune limitazioni (ad esempio la notevole quantità di energia consumata e una flessibilità limitata) che possono limitarne le prestazioni e l'efficienza. Di conseguenza, l'hardware specializzato, come FPGAs e ASICs, viene sempre più utilizzato per fornire soluzioni ad alte prestazioni e ad alta efficienza energetica per l'esecuzione di CNN in una varietà di applicazioni. In particolare, le FPGAs rappresentano una piattaforma promettente per l'accelerazione delle CNNs dal punto di vista hardware, perché hanno velocità elevate e basso consumo energetico. In questa tesi, un potente framework (hls4ml) viene utilizzato per sintetizzare CNNs ed eseguire modelli su FPGA Alveo U280. Questo tipo di FPGA è una potente scheda di accelerazione sviluppata da Xilinx e progettata per fornire elaborazione ad alte prestazioni e bassa latenza per carichi di lavoro ad alta intensità di calcolo nei data center e negli ambienti di cloud computing. L'intero lavoro consiste in un'analisi approfondita dei modelli sintetizzati confrontandoli con un altro strumento (Vitis AI). La tesi si propone di mostrare la potenza di questi strumenti mostrando quando è meglio usarne uno piuttosto che l'altro. Questo lavoro potrebbe diventare fondamentale per chiunque voglia utilizzare questi strumenti al fine di sintetizzare reti neurali su FPGA.