Online power modeling, monitoring and optimization for mobile computing platforms

The Internet-of-Things (IoT) revolution fueled new challenges and opportunities to achieve computational efficiency goals. Embedded devices are required to execute multiple applications for which a suitable distribution of the computing power must be adapted at run-time. Such complex hardware platforms have to sustain the continuous acquisition and processing of data under severe energy budget constraints, since most of them are battery powered. The state-of-the-art offers several adhoc contributions to selectively optimize the performance considering aspects like energy, power, thermal or reliability. In this scenario, the use of hardware-level online power monitors is crucial to support the run-time power optimizations required to meet the ever increasing demand for energy efficiency. To be effective and to deal with the time-to-market pressure, the presence of such requirements must be considered even during the design of the power monitoring infrastructure. This thesis presents a power model identification and implementation strategy with two main advantages over the state-of-the-art. First, the proposed solution trades the accuracy of the power model with the amount of resources allocated to the power monitoring infrastructure. Second, the use of an automatic power model instrumentation strategy ensures a timely implementation of the power monitor regardless the complexity of the target computing platforms. To assess the effectiveness of the proposed solution the identified power monitor has been adopted to feed a power optimization scheme, based on a control theory based PID controller. Both the single-core and multi-core scenarios have been taken into consideration. The online power monitor has been validated against 8 accelerators generated through a High-Level-Synthesis flow and by considering a more complex RISC-V embedded computing platform. The all-digital power optimization scheme has been validated against the nu+ processor, a 16-ways SIMD processor with a configurable number of cores. For the assessment of the proposed control scheme, this thesis considers the four core configuration, running 20 applications from the WCET benchmark suite. For what concerns the power monitor, depending on the imposed user-defined constraints and with respect to the unconstrained power monitoring state-of-the-art solutions, the proposed methodology shows a resource saving between 37.3% and 81% while the maximum average accuracy loss stays within 5%, i.e., using the aggressive 20us temporal resolution. However, by varying the temporal resolution closer to the value proposed in the state of the art, i.e. in the range of hundreds of microseconds, the average accuracy loss of the power monitors is lower than 1% with almost the same overheads. In addition, the presented solution demonstrated the possibility of delivering a resource constrained power monitor employing a 20us temporal resolution, i.e., far higher the one used by current state-of-the-art solutions. The power optimization scheme, instead, shows an overhead limited to 0.86%(FFs) and 5.3%(LUTs) of the FPGA chip. The performance results are analyzed considering three quality metrics. First, the efficiency in exploiting the imposed budget (EF Fg) that is on average 98.27%. Second, the overflow of the actual average power consumption with respect to the assigned budget (OV Fg), which is limited to 1.43 mW on average. Last, the performance utility loss due to the control scheme that is limited to 1.87% on average.

La rivoluzione dell’Internet of Things (IoT) ha dato vita a nuove sfide e occasioni per raggiungere nuovi traguardi di efficienza computazionale. I dispositivi embedded sono spinti sempre oltre, per dar modo di affrontare applicazioni per le quali le alte prestazioni devono talvolta lasciar spazio a requisiti di risparmio energetico. Dato che tali dispositivi sono spesso alimentati a batteria, devono spesso svolgere compiti di acquisizione e elaborazione di dati sotto stringenti requisiti energetici. La letteratura offre molteplici soluzioni per ottimizare il consumo energetico di tali dispositivi, pur mantenendo un alto profilo in termini di prestazioni. In questo scenario, l’utilizzo di hardware power meters si rivela cruciale, al fine di supportare efficaci tecniche di gestione del consumo di potenza a run time. Per essere incisivi e pronti a ridurre i tempi di ingresso nel mercato, si deve condiserare la presenza di tali requisiti fin dalle prime fasi di design. Questa tesi presenta una metodologia di identificazione e implementazione di power monitors con due principali vantaggi rispetto a quanto presente in letteratura. Primo di tutti questo approccio gestisce il tradeoff tra una metrica di accuratezza del modello identificato e il corrispettivo overhead introdotto. Secondo, utilizzando un approccio completamente automatico, questa metodologia diminuisce sensibilmente i tempi di implementazione. Per verificare l’efficiacia di questa metodologia, sono stati considerati scenari con processori sia single che multi core, utilizzando attuatori costruiti secondo la teoria del controllo. L’efficacia dei power metes viene in questo lavoro verificata utilizzando otto acceleratori hardware e un System on Chip che implementa un processore RISC-V. Per quanto riguarda invece l’accuratezza degli attuatori viene preso in considerazione un processore SIMD a 16 linee, con un numero configurabile di cores attivi, su cui vengono eseguiti venti benchmarks provenienti dalla suite WCET. Per quanto concerne i power monitors, a seconda dei constraints imposti dall’utente, rispetto alla versione senza constraints, questa metodologia presenta un risparmio in termini di risorse che va dal 37.3% al 81%, mentre la perdita di accuratezza rimane sotto il 5%., usando la maggiore risoluzione temporale (20us). Variando tale risoluzione temporale, e portandola ai valori presenti in letteratura (centinaia di microsecondi) la perdita di accuratezza rimane sotto il 1%. Questa metodologia presenta inoltre il vantaggio di poter raggiungere risoluzioni temporali molto fitte, utili in casi in cui la dinamica del consumo di potenza diventa molto alta. Lo schema di controllo del consumo di potenza, invece, presenta un overhead dello 0.86% (FFs) e 5.3%(LUTs). In questo lavoro sono state definite tre metriche per valutare l’efficacia dello schema presentato: l’efficienza risulta del 98.27%, l’overflow di 1.43mW e l’utility loss del 1.87%.