Code generation for hybrid hardware-software mapping of deep convolutional neural networks on dataflow neural processing units

Deep Convolutional Neural Networks (DCNNs) are broadly used for various artificial intelligence (AI) tasks. This master thesis work is focused on developing an infrastructure for code generation for hybrid hardware-software mapping of DCNNs on dataflow Neural Processing Units (NPUs). It introduces a compilation approach to manage network layers whose type does not match the existing DCNNs hardware accelerators features or the ones arranged to be executed on multiple software processing units. The proposed solution is applied to ATONN, a DCNNs compiler developed by STMicroelectronics, already able to map hardware accelerators, extending its capabilities integrating software mapping features. This requires the development of new specific methods in the compiler lowering phase arranging the network intermediate representation graph for the next compilation stages, along with new code generation patterns with parallelization detection strategy and adaptable generation flows. To manage the scalability of the proposed solution, new functions and structures are introduced in the runtime environment in addition to a new low-level software layer in the execution stack. Experimental results are carried out on ST Orlando System-on-Chip and ST CubeAI as software inference libraries with both RTL and x86 simulations. Experiments involve the execution of the state-of-the-art network VGG16, going forward with network topologies and setups that trigger the new code generation and runtime parallelization features showing their scalability effectiveness. Experiments about compilation and network execution exploiting co-design properties of the Orlando SoC are performed allowing mixed inference on both hardware and software units simultaneously verifying the functional behaviours.

Le Deep Convolutional Neural Networks (DCNNs) sono ampiamente utilizzate in ambito Intelligenza Artificiale (AI). Questa tesi magistrale è focalizzata sullo sviluppo di un'infrastruttura per la generazione di codice ed il mapping ibrido hardware-software di DCNNs su dataflow Neural Processing Units (NPUs). La tesi propone un approccio per la compilazione in grado di gestire livelli della rete il cui tipo non è compatibile con le caratteristiche degli acceleratori hardware o per i livelli designati ad essere eseguiti su unità di elaborazione software multiple. La soluzione proposta è applicata ad ATONN, un compilatore per DCNNs sviluppato da STMicroelectronics, già in grado di mappare acceleratori hardware, estendendo le sue capacità ed integrando anche il mapping su software. Il lavoro ha richiesto lo sviluppo di metodi specifici nella fase di "lowering" del compilatore, predisponendo il grafo della rappresentazione intermedia della rete per i passi successivi della compilazione, oltre a nuovi patterns di generazione del codice, una strategia per rilevare la parallelizzazione e flussi adattabili di generazione del codice. Per gestire la scalabilità della soluzione proposta, sono state introdotte nuove funzionalità e strutture nell'ambiente di runtime in aggiunta ad un nuovo strato a basso livello nello stack software dell'esecuzione. I risultati sperimentali sono stati ottenuti con ST Orlando System-on-Chip e ST CubeAI usata come libreria per l'inferenza software, per simulazioni sia RTL che x86. Gli esperimenti hanno riguardato l'esecuzione della rete stato dell'arte VGG16, procedendo con reti aventi topologie tali da innescare i nuovi meccanismi di generazione di codice e gestione del runtime per l'inferenza parallela dimostrando l'efficacia della scalabilità. Sono stati inoltre eseguiti esperimenti sulla compilazione ed esecuzione di reti sfruttando le proprietà di co-design di Orlando SoC che hanno permesso l'inferenza mista su unità hardware e software contemporaneamente, verificandone il comportamento funzionale.