A methodology to augment RTL designs with online power monitoring capability

The Power Wall severely limits the performance scalability of current multi-cores, thus requiring fresh low-power methodologies at run-time to enable aggressive power saving techniques. The online power monitoring is the de-facto solution to deliver accurate run-time power estimates of the architecture thus represent- ing the key enabling factor for complex energy-performance resource allocation schemes. The performance counter-based monitoring solutions are the state-of- the-art online power modeling techniques while suffering two major drawbacks. First, the statistics are collected from the architectural performance counters and fed into the power model that is usually implemented as a software subroutine. Thus, the computational power used to update the power estimate degrades the overall system performance. Second, the performance counters are an optional architectural feature, thus their implementation and visibility at software level shouldn’t be taken for granted. Moreover, the standard performance counter- based monitoring solution cannot be readily applied to custom hardware acceler- ators for which the software ISA can be even undefined. The thesis introduces a novel, RTL-level power monitoring methodology to face the limitations of the current state-of-the-art solutions. The proposed method- ology elaborates on the Value Change Dump (VCD) information and the post synthesis power traces of a target architecture to automatically derive the power model. The RTL of the target architecture is then augmented by adding the RTL description of the previously identified power model for which the power estimates are made available to the software level by means of a special purpose register. The power prediction is computed leveraging the additional ad-hoc hard- ware that is tightly connected with the target architecture, thus avoiding the use of any performance counter. Moreover, the system performance are not affected by the power monitoring infrastructure and both the area and power overheads are limited compared with the state of the art solutions. The proposed methodology has been validated against a fully compliant Open-Risc 1000 implementation. The obtained results show an average prediction error less than 10%, with a power and area overheads limited to 6.89% and 4.71%, re- spectively.

Il Power Wall limita l’incremento della potenza computazionale per gli attuali multi-cores, sia in ambito embedded che in ambito High Performance Coumpu- ting (HPC). In generale la capacit ́a computazionale delle architetture multi-cores risulta spesso sovradimensionata al fine di garantire una maggior flessibilit ́a per differenti scenari applicativi. Tuttavia questa pratica pui ́o portare ad un sot- toutilizzo delle risorse computazionali con un conseguente spreco di energia. In particolare, le tecniche di ottimizzazione della potenza a design-time consentono di raggiungere solamente un compromesso tra il consumo energetico e le presta- zioni dell’architecttura. Le tecniche per l’ottimizzazione del consumo di potenza a run-time, consen- tono invece di ottenere miglioramenti significativi poich agiscono sulla base delle effettive condizioni operative del sistema. L’online power monitoring raprresenta un insieme di tecniche he consentono un monitoraggio a run-time della potenza consumata e costituiscono la base per ogni schema di ottimizzazione del consumo di potenza a run-time. L’identificazione di modelli di potenza tramite l’utilizzo dei performance counters representa lo stato dell’arte nell’ambito dell’online power monitoring anche se queste soluzioni sono affette da due limitazioni. Primo, le sta- tistiche sono collezionate dai performance counters architetturali e fornite al mo- dello di potenza, che generalmente implementato in software. Quindi una parte della potenza computazionale utilizzata per l’aggiornamento della predizione del consumo di potenza, degradando le performance del sistema. Inoltre que- ste soluzioni richiedono l’esecuzione di software definito dall’utente limitandone quindi l’utilizzo negli acceleratori hardware che non espongono un Instraction Set Architecture (ISA) al programmatore software. Secondo, i performance counters sono strutture hardware opzionali, di cui non quindi garantita l’implementazione nell’architettura finale. Questa tesi presenta una metodologia di online power monitoring che per- mette l’istrumentazione del modello di potenza identificato direttamente nell’RTL dell’architettura target risolvendo le due limitazioni presenti nelle soluzioni stato dell’arte. La metodologia proposta sfrutta le informazioni contenute nal Value Change Dump (VCD) e le tracce di potenza ottenute dalle simulazioni post-sintesi per ricavare automaticamente la funzione matematica che descrive il consumo di po- tenza dell’architettura. Successivamente, il modello del predittore di potenza identificato viene trasformato in una descrizione RTL che aggiunta, in modo au- tomatico, a quella dell’architettura target. La predizione del consumo di potenza calcolata tramite strutture hardware dedicate eliminando quindi ogni overhead prestazionale. Inoltre non vengono utilizzati i performance counters permettendo quindi l’utilizzo della proposta ad ogni architettura per la quale sia disponibile la descrizione RTL. La metodologia proposta ́e stata validata tramite un’implementazione archi- tetturale completamente compatibile con la specifica OpenRisc 1000. I risultati ottenuti mostrano un errore di predizione medio minore dell’11%, con un overhead in termini di area e potenza rispettivamente del 6.89% e 4.71%.