On how to effectively target FPGAs from domain specific tools

Heterogeneous System Architectures (HSAs) represent a promising solution to face the limitations of modern homogenous architectures, in terms of both performance and power efficiency. Indeed, thanks to the combination of hardware accelerators like GPUs, FPGAs, and dedicated ASICs, such systems are able to efficiently run performance demanding applications belonging to different application scenarios (like image and signal processing, linear algebra, computational biology, etc.) on the most suitable device for that domain. In order to fully take advantage of HSAs, in the last years new programming models and tools able to efficiently target such architectures, in terms of both final performance and productivity, emerged. Domain Specific Languages (DSLs) and Machine Learning (ML) frameworks are two significant examples. Both permit users to quickly and easily develop portable and efficient designs for multiple architectures. However, although DSLs and ML frameworks are highly effective in assisting users towards the generation of efficient designs for CPUs and GPUs, they still lack a concrete support for FPGAs. Indeed, even though FPGA toolchains have significantly improved and increased their features over the last years, the whole FPGA design process remains complex and the integration with high-productivity tools and languages is still limited. For these reasons, this research project focuses on the development of tools able to efficiently and easily target FPGAs from domain-specific scenarios. In particular, it consists in both a framework for the fast-prototyping and deployment of CNN accelerators on FPGA, and FROST, a unified backend to efficiently hardware-accelerate DSLs on FPGAs. On one hand, the goal of the CNN framework is to bridge the gap between high-productivity ML frameworks, like TensorFlow and Caffe, and FPGA design process. The framework automatizes the CNN implementation flow on FPGA, supports Caffe descriptions of the network, and provides a C++ library to design dataflow accelerators, as well as an integration with TensorFlow to train the network. On the other, starting from an algorithm described in one of the supported DSLs, FROST translates it into its Intermediate Representation (IR), performs a series of FPGA-oriented optimizations steps, and, finally, generates an optimized design suitable of FPGA tools. In order to better leverage the features of the FPGA and enhance the performance, FROST provides a high-level scheduling co-language the user can exploit to guide the optimizations to apply, as well as specify the architecture to implement. This allows to easily evaluate different hardware designs and choose the most suitable to the input algorithm.

I Sistemi di Architetture Eterogenee (HSA) rappresentano una soluzione promettente per fronteggiare le limitazioni delle moderne architetture omogenee, in termini sia di prestazioni che di efficienza dal punto di vista della potenza. Infatti, grazie alla combinazione di acceleratori hardware come GPU, FPGA e ASIC dedicati, tali sistemi sono in grado di eseguire efficientemente applicazioni che richiedono alte prestazioni e che appartengono a diversi scenari applicativi (come il processamento di immagini e segnali, l’algebra lineare, la biologia computazionale, e così via) sul dispositivo più adatto per quel dominio. Al fine di sfruttare a pieno gli HSA, negli ultimi anni sono emersi nuovi modelli e strumenti di programmazione in grado di rivolgersi a tali architetture, in termini sia di prestazioni finali che di produttività. Linguaggi a Dominio Specifico (DSL) e framework di Machine Learning (ML) sono due esempi significativi. Entrambi permettono all’utente di sviluppare velocemente e facilmente design portabili ed efficienti per architetture multiple. Tuttavia, nonostante DSL e framework di ML sono altamente efficienti per CPU e GPU, non lo sono altrettanto per FPGA. Infatti, anche se, nel corso degli ultimi anni, le toolchain per FPGA sono significantivamente migliorate ed hanno aumentato le loro caratteristiche, l’intero processo di design per FPGA resta complesso e l’integrazione con strumenti e linguaggi ad alta produttività è ancora limitato. Per queste ragioni, questo progetto di ricerca si concentra sullo sviluppo di strumenti in grado di rivolgersi in maniera efficiente e facile alle FPGA partendo da scenario a dominio specifico. In particolare, questo progetto consiste in sia un framework per lo sviluppo veloce di acceleratori per CNN su FPGA, e FROST, un backend unificato per accelerare efficientemente i DSL su FPGA. Da un lato, lo scopo del framework per le CNN è di colmare lo spazio tra i framework di ML ad alta produttività, come TensorFlow e Caffe, e il processo di design per FPGA. Il framework automatizza il flusso di implementazione di CNN su FPGA, supporta descrizioni in Caffe della rete, e fornisce una libreria C++ per sviluppare acceleratori dataflow, insieme ad una integrazione con TensorFlow per allenare la rete. Dall’altro lato, partendo da un algoritmo descritto in one dei DSL supportati, FROST lo traduce nella propria rappresentazione intermedia (IR), applica una serie di passi di ottimizzazioni orientati alle FPGA, e, infine, genera una implementazione ottimizzata adatta per gli strumenti per FPGA. Al fine di sfruttare al meglio le caratteristiche della FPGA e migliorare le prestazioni, FROST fornisce un co-linguaggio di scheduling ad alto livello che l’utente può sfruttare per guidare le ottimizzazioni da applicare, e specificare l’architettura da implementare. Questo permette di valutare facilmente design hardware differenti e scegliere la più adatta l’algoritmo in input.

On how to effectively target FPGAs from domain specific tools

DEL SOZZO, EMANUELE

Abstract

Heterogeneous System Architectures (HSAs) represent a promising solution to face the limitations of modern homogenous architectures, in terms of both performance and power efficiency. Indeed, thanks to the combination of hardware accelerators like GPUs, FPGAs, and dedicated ASICs, such systems are able to efficiently run performance demanding applications belonging to different application scenarios (like image and signal processing, linear algebra, computational biology, etc.) on the most suitable device for that domain. In order to fully take advantage of HSAs, in the last years new programming models and tools able to efficiently target such architectures, in terms of both final performance and productivity, emerged. Domain Specific Languages (DSLs) and Machine Learning (ML) frameworks are two significant examples. Both permit users to quickly and easily develop portable and efficient designs for multiple architectures. However, although DSLs and ML frameworks are highly effective in assisting users towards the generation of efficient designs for CPUs and GPUs, they still lack a concrete support for FPGAs. Indeed, even though FPGA toolchains have significantly improved and increased their features over the last years, the whole FPGA design process remains complex and the integration with high-productivity tools and languages is still limited. For these reasons, this research project focuses on the development of tools able to efficiently and easily target FPGAs from domain-specific scenarios. In particular, it consists in both a framework for the fast-prototyping and deployment of CNN accelerators on FPGA, and FROST, a unified backend to efficiently hardware-accelerate DSLs on FPGAs. On one hand, the goal of the CNN framework is to bridge the gap between high-productivity ML frameworks, like TensorFlow and Caffe, and FPGA design process. The framework automatizes the CNN implementation flow on FPGA, supports Caffe descriptions of the network, and provides a C++ library to design dataflow accelerators, as well as an integration with TensorFlow to train the network. On the other, starting from an algorithm described in one of the supported DSLs, FROST translates it into its Intermediate Representation (IR), performs a series of FPGA-oriented optimizations steps, and, finally, generates an optimized design suitable of FPGA tools. In order to better leverage the features of the FPGA and enhance the performance, FROST provides a high-level scheduling co-language the user can exploit to guide the optimizations to apply, as well as specify the architecture to implement. This allows to easily evaluate different hardware designs and choose the most suitable to the input algorithm.

Scheda breve

Scheda completa

	Relatore
	
				SANTAMBROGIO, MARCO DOMENICO
			
	Coordinatore
	
				PERNICI, BARBARA
			
	Tutor
	
				BOLCHINI, CRISTIANA
			
	Data
	
				6-feb-2019
			
	Abstract in italiano
	
				I Sistemi di Architetture Eterogenee (HSA) rappresentano una soluzione promettente per fronteggiare le limitazioni delle moderne architetture omogenee, in termini sia di prestazioni che di efficienza dal punto di vista della potenza. Infatti, grazie alla combinazione di acceleratori hardware come GPU, FPGA e ASIC dedicati, tali sistemi sono in grado di eseguire efficientemente applicazioni che richiedono alte prestazioni e che appartengono a diversi scenari applicativi (come il processamento di immagini e segnali, l’algebra lineare, la biologia computazionale, e così via) sul dispositivo più adatto per quel dominio. Al fine di sfruttare a pieno gli HSA, negli ultimi anni sono emersi nuovi modelli e strumenti di programmazione in grado di rivolgersi a tali architetture, in termini sia di prestazioni finali che di produttività. Linguaggi a Dominio Specifico (DSL) e framework di Machine Learning (ML) sono due esempi significativi. Entrambi permettono all’utente di sviluppare velocemente e facilmente design portabili ed efficienti per architetture multiple. Tuttavia, nonostante DSL e framework di ML sono altamente efficienti per CPU e GPU, non lo sono altrettanto per FPGA. Infatti, anche se, nel corso degli ultimi anni, le toolchain per FPGA sono significantivamente migliorate ed hanno aumentato le loro caratteristiche, l’intero processo di design per FPGA resta complesso e l’integrazione con strumenti e linguaggi ad alta produttività è ancora limitato. Per queste ragioni, questo progetto di ricerca si concentra sullo sviluppo di strumenti in grado di rivolgersi in maniera efficiente e facile alle FPGA partendo da scenario a dominio specifico. In particolare, questo progetto consiste in sia un framework per lo sviluppo veloce di acceleratori per CNN su FPGA, e FROST, un backend unificato per accelerare efficientemente i DSL su FPGA. Da un lato, lo scopo del framework per le CNN è di colmare lo spazio tra i framework di ML ad alta produttività, come TensorFlow e Caffe, e il processo di design per FPGA. Il framework automatizza il flusso di implementazione di CNN su FPGA, supporta descrizioni in Caffe della rete, e fornisce una libreria C++ per sviluppare acceleratori dataflow, insieme ad una integrazione con TensorFlow per allenare la rete. Dall’altro lato, partendo da un algoritmo descritto in one dei DSL supportati, FROST lo traduce nella propria rappresentazione intermedia (IR), applica una serie di passi di ottimizzazioni orientati alle FPGA, e, infine, genera una implementazione ottimizzata adatta per gli strumenti per FPGA. Al fine di sfruttare al meglio le caratteristiche della FPGA e migliorare le prestazioni, FROST fornisce un co-linguaggio di scheduling ad alto livello che l’utente può sfruttare per guidare le ottimizzazioni da applicare, e specificare l’architettura da implementare. Questo permette di valutare facilmente design hardware differenti e scegliere la più adatta l’algoritmo in input.
			
	Tipo di documento
	
				Tesi di dottorato
			
	Appare nelle tipologie:
	
				Tesi di Dottorato

File allegati

File	Dimensione	Formato
thesis.pdf Open Access dal 22/01/2020 Descrizione: thesis Dimensione 2.36 MB Formato Adobe PDF Visualizza/Apri	2.36 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/144272