Bringing perturbed affine arithmetic to MLIR

The rapid advancement of large language models (LLMs) has puzzled computer scien- tists with one fundamental question: How can we address the exponentially increasing computational demands associated with their training and inference? Although significant progress has been achieved in developing hardware architectures optimized for artificial in- telligence (AI), relying exclusively on hardware improvements is impractical—particularly for low-power embedded devices used for edge inference, where such advancements have limited impact. AI models are typically reliant on floating-point operations for both training and inference. A widely adopted software-based strategy to reduce computational requirements is the substitution of floating-point operations with fixed-point operations. However, fixed-point arithmetic necessitates accurate knowledge of the value ranges of variables, precise range estimation is therefore crucial to ensuring both correctness and efficiency. Conventional approaches, such as interval arithmetic, are provably safe but often yield excessively conservative bounds, significantly overshooting the true range of values. This overestimation can lead to suboptimal data allocation, increasing memory footprint and degrading performance. A more precise mathematical framework is thus required. Affine arithmetic offers a promising alternative, as it can capture correlations between variables and produce tighter bounds. This thesis proposes a hybrid approach to the value range analysis (VRA) problem, imple- mented within the MLIR framework, combining interval arithmetic and affine arithmetic to infer value ranges for fixed-point operations. While the proposed method incurs addi- tional computational overhead for range analysis, this cost is negligible compared to the gains in precision and the resulting improvements in memory efficiency and performance.

Il rapido sviluppo dei modelli linguistici di grandi dimensioni (LLMs) ha posto agli infor- matici una domanda fondamentale: come affrontare l’aumento esponenziale delle risorse computazionali richieste per il loro addestramento e inferenza? Nonostante i progressi nelle architetture hardware ottimizzate per l’intelligenza artificiale (AI), affidarsi esclu- sivamente all’hardware è impraticabile—soprattutto per i dispositivi embedded a bassa potenza, dove tali miglioramenti hanno scarso impatto. Poiché i modelli di AI si basano principalmente su operazioni in virgola mobile, una strategia software diffusa è la sostituzione con operazioni in virgola fissa. Questa, tuttavia, richiede stime accurate degli intervalli di valori delle variabili: un’analisi imprecisa può causare overflow (se sottostimata) o inefficienza e spreco di memoria (se sovrastimata). Gli approcci classici, come l’aritmetica intervallare, sono sicuri ma spesso troppo con- servativi. L’aritmetica affine rappresenta invece un’alternativa promettente, in grado di catturare le correlazioni tra variabili e produrre limiti più stretti. Questa tesi propone un approccio ibrido alla *value range analysis* (VRA), implementato nel framework MLIR, che combina aritmetica intervallare e affine per inferire gli intervalli nelle operazioni in virgola fissa. Sebbene il metodo introduca un lieve sovraccarico com- putazionale, questo è trascurabile rispetto ai guadagni in precisione e ai miglioramenti in efficienza di memoria e prestazioni.