The design and optimization of AI hardware accelerators are becoming increasingly challenging. Modern deep learning models are growing larger and more complex, requiring massive computational resources. At the same time, edge devices impose strict constraints on energy consumption, latency, and physical resources. These opposing factors create a growing need for innovative solutions that can balance model complexity with the limitations of edge hardware. Among the emerging solutions, In-Memory Computing (IMC) has gained significant attention thanks to its intrinsic compatibility with convolution operations, which enables efficient multiply–accumulate (MAC) acceleration. By performing computations directly within the memory arrays, IMC effectively mitigates the data transfer bottleneck between memory and processing units — a major limitation of Von Neumann architectures — while exploiting the high data reuse opportunities of convolutional workloads. Modeling and evaluating such accelerators to minimize energy consumption and latency while staying within strict memory and computational resource budgets remains a key research problem addressed by recent state-of-the-art studies. This thesis proposes a methodology for design space exploration (DSE) of neural network accelerators based on IMC technology, with a specific focus on convolutional workloads. The proposed framework integrates and explores existing algorithms and methodologies — enhanced with dedicated optimizations — to efficiently identify a set of Pareto-optimal design configurations that balance performance, energy, and resource utilization across multiple workloads under given system constraints. Experimental evaluations reveal an effective evolutionary process that identifies configurations with an optimal energy–latency balance relative to other comparable architectures.
La progettazione e l’ottimizzazione degli acceleratori per AI stanno diventando sempre più complesse. I moderni modelli di deep learning stanno infatti crescendo in dimensioni e complessità, richiedendo risorse computazionali sempre più elevate. Allo stesso tempo, i dispositivi edge impongono vincoli stringenti in termini di consumo energetico, latenza e risorse fisiche disponibili. Questi fattori contrastanti generano una crescente necessità di soluzioni innovative in grado di bilanciare la complessità dei modelli con le limitazioni dell’hardware edge. Tra le soluzioni emergenti, l’In-Memory Computing (IMC) ha attirato particolare attenzione grazie alla sua intrinseca compatibilità con le operazioni convoluzionali, che consente un’efficiente accelerazione delle operazioni di multiply–accumulate (MAC). Eseguendo i calcoli direttamente all’interno delle matrici di memoria, l’IMC riduce in modo significativo il collo di bottiglia legato al trasferimento dei dati tra memoria e unità di calcolo — una delle principali limitazioni delle architetture di tipo Von Neumann — sfruttando al contempo le elevate opportunità di data reuse proprie dei workload convoluzionali. La modellazione e la valutazione di tali acceleratori, con l’obiettivo di minimizzare il consumo energetico e la latenza rispettando vincoli rigorosi in termini di memoria e risorse computazionali, rappresentano tuttora una sfida di ricerca centrale affrontata dagli studi più recenti state-of-the-art. Questa tesi propone una metodologia di Design Space Exploration (DSE) per acceleratori di reti neurali basati su tecnologia IMC, con un focus specifico sui workload convoluzionali. Il framework proposto integra e adatta algoritmi e metodologie esistenti — arricchiti da ottimizzazioni dedicate — per identificare in modo efficiente un insieme di configurazioni Pareto-optimal in grado di bilanciare performance, consumo energetico e utilizzo delle risorse su molteplici workload, nel rispetto dei vincoli imposti dal sistema. Le valutazioni sperimentali mostrano un processo evolutivo efficace, capace di individuare configurazioni con un bilanciamento ottimale energy–latency rispetto ad altre architetture comparabili.
Modeling and exploration of AI accelerators based on digital in-memory-computing
de Gennaro, Valeria
2025/2026
Abstract
The design and optimization of AI hardware accelerators are becoming increasingly challenging. Modern deep learning models are growing larger and more complex, requiring massive computational resources. At the same time, edge devices impose strict constraints on energy consumption, latency, and physical resources. These opposing factors create a growing need for innovative solutions that can balance model complexity with the limitations of edge hardware. Among the emerging solutions, In-Memory Computing (IMC) has gained significant attention thanks to its intrinsic compatibility with convolution operations, which enables efficient multiply–accumulate (MAC) acceleration. By performing computations directly within the memory arrays, IMC effectively mitigates the data transfer bottleneck between memory and processing units — a major limitation of Von Neumann architectures — while exploiting the high data reuse opportunities of convolutional workloads. Modeling and evaluating such accelerators to minimize energy consumption and latency while staying within strict memory and computational resource budgets remains a key research problem addressed by recent state-of-the-art studies. This thesis proposes a methodology for design space exploration (DSE) of neural network accelerators based on IMC technology, with a specific focus on convolutional workloads. The proposed framework integrates and explores existing algorithms and methodologies — enhanced with dedicated optimizations — to efficiently identify a set of Pareto-optimal design configurations that balance performance, energy, and resource utilization across multiple workloads under given system constraints. Experimental evaluations reveal an effective evolutionary process that identifies configurations with an optimal energy–latency balance relative to other comparable architectures.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_12_deGennaro_Tesi.pdf
accessibile in internet per tutti
Dimensione
3.47 MB
Formato
Adobe PDF
|
3.47 MB | Adobe PDF | Visualizza/Apri |
|
2025_12_deGennaro_Executive Summary.pdf
accessibile in internet per tutti
Dimensione
882.86 kB
Formato
Adobe PDF
|
882.86 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/246432