A methodology for error simulation in convolutional neural networks executed on GPU

Nowadays, there is growing interest in employing Convolutional Neural Networks (CNNs) in safety-critical systems. CNNs achieve higher accuracy in perception tasks than the traditional Computer Vision (CV) algorithms. The execution of CNNs is generally accelerated on Graphic Process Units (GPUs) because are capable of speeding up the inference of CNNs, thanks to their parallel architectures. The acceleration enables the application to meet the strict requirements imposed by safety-critical systems, especially time requirements. The combination, composed of a CNN executed on GPU, is becoming more and more used in safety-critical systems. There- fore, we must ensure the proper functioning of such a combination in any possible situation, also in the presence of faults in the digital systems. The reliability analysis aims at studying the behavior of systems under the occurrence of faults; the goal is to determine whether the system is able to work correctly by autonomously handling the occurred errors, or it fails, thus producing a wrong result. The most insidious threats in our context are the faults caused by environmental conditions, called soft errors. Soft errors do not have disruptive effects, but they induce transient effects that corrupt the state of the system. Indeed, a soft error may change the value of a bit stored in a memory cell, thus inducing an error when that cell is read. Therefore, the activation of a soft error may induce the application to deviate its behavior from its expected functioning. As a matter of fact, it is necessary to understand how the CNN behaves when the soft errors are activated and how much it deviates from the nominal behavior. In fact, the outcomes produced by the CNN can be used by decision-making systems, which choices may have a direct impact on the safety of the users. Traditionally, the literature is headed towards the reliability analysis of CNNs executed on GPUs through the architectural fault injection. The architectural fault injection for GPU is a technique that emulates the ac- tivation of soft errors within the architecture of the device. The injection occurs by inserting bitflips in the GPU data-path, with effects similar to those caused by the physical event. Although it is very accurate, the ar- chitectural fault injection poses severe constraints on the system under analysis. The implementation techniques, exploited by the fault injector to emulate the faults, slow down the execution of the application. The slow down may lead the application to not comply with the time constraints to which the whole system is subject. Secondly, the integration of the fault injector and the application is challenging because the application needs to be modified to allow the fault injector to operate. The modification may require to recompile the source code, which can be difficult if the applica- tion uses closed-source libraries. Rather than fault injection, the reliability analysis can also be performed through error simulation. The error simula- tion is a technique that simulates the effects of the activation of soft errors directly in the source code of the application running on the GPU. This happens by corrupting one or more values of the application according to error models. The implementation of an error simulator is far easier than a fault injector because the error simulator can be directly integrated within the Machine Learning (ML) frameworks, which are commonly used to de- velop CNNs. The main issue related to the error simulation regards the error models with which corrupting the application. The error models must be capable of reproducing the effects of physical faults that occurred in the underlying hardware. Thus, the error models are required to be validated. When the error models are not validated, there is a risk to introduce errors within the system that do not correspond to reality, leading to incorrect outcomes. In the literature, we do not find any validated error models for CNNs executed on GPU since the majority of the works are focused on the architectural fault injection.For these reasons, the purpose of this thesis is to define a methodologi- cal framework for the error simulation using validated models in a CNN at the application level. The goal of the framework is to connect the abstrac- tion level of the GPU architecture, where faults are generally emulated, and the abstraction level of the CNN, where the behavior of the program is analyzed to evaluate the effects of the faults. At first, we have defined a methodology to create the error models. Thus, such a methodology en- ables us to derive validated error models for the single operator of the CNN. We have performed several architectural fault injection campaigns, targeting the single CNN operator, obtaining thousands of faulty outputs. The faulty outputs are originated by the activation of soft errors injectedduring the fault injection campaigns. The error models are built by ana- lyzing these faulty outputs according to three parameters: the number and domains of corrupted values and spatial patterns. These three parameters are defined statistically. The statistic approach enables us to recreate any of the observed faulty outputs by drawing each parameter from its distribu- tion. The error models are thus validated by construction because derived upon the analysis of the faulty outputs. Nonetheless, their effects will be further compared to the ones obtained using the state-of-the-art GPU fault injector publicly available. Besides the error modeling, the framework pro- poses an approach for performing error simulation campaigns on a CNNs executed on GPU. Such an approach enables us to sabotage the output of a CNN operator, according to the error models defined above. The error simulation allows a higher degree of integration with the application. The higher integration leads us to speed up the execution of error campaigns compared to the current practice. The framework has been then implemented, bringing us to obtain a repository of error models and an error simulator tool. For the sake of demonstration, the error models repository contains the error models of 11 CNN operators, such as Convolution, Batch Normalization, or Leaky ReLU. Nonetheless, the repository is extendable by applying the same er- ror modeling approach to the other operators. Each model is thus composed of the three probability distributions, one for each parameter. The error simulator is a tool designed for corrupting the outputs of the CNN oper- ators. The corruption, i.e., the insertion of errors, is performed according to the error models present in the repository. The tool is built upon the TensorFlow ML framework, with which the CNN is developed. The error simulator also features some advanced injection techniques, such as check- pointing, or the extensive usage of the cache. These optimizations enable the tool to reuse the intermediate computations, achieving execution times close to the native execution. Finally, we have compared our framework to two baselines in real case studies. The first comparison regards SASSIFI that is the state-of-the- art GPU fault injector developed by NVIDIA. With that tool, we have compared the execution times and accuracy of our error models. For this comparison, we have used the TensorFlow implementation of the YOLO V3 CNN, which is the state-of-the-art network for object detection. We have simulated 137,000 errors with our error simulator tool in 15 hours. We have injected 360,000 faults with SASSIFI to obtain the same amount of errors because most of the faults have not been activated. The overall time required by SASSIFI has been 92 hours; thus, the same campaign through our error simulator is 6.1x times faster than the one using SASSIFI. Among these 137,000 errors produced by our error simulator, we have analyzed the effects generated by them in the output of YOLO V3. The obtained effects are equal to the ones generated by SASSIFI in 98.72% of the cases. The second comparison regards a novel error simulator, TensorFI, that enables us to perform reliability analysis of CNNs. For this comparison, we were not able to use the YOLO V3 network due to technical limitations and design flaws present in TensorFI. Thus, we have used the LeNet-5 model for the MNIST dataset and a model for the CIFAR10 dataset, both performing object classification within images. The models are significantly smaller than YOLO V3 and enable us to test TensorFI. We have simulated 10,000 errors with our error simulator tool in 24.74 and 37.62 seconds for LeNet-5 and CIFAR10 models, respectively. TensorFI has simulated the same amount of errors in 1098.71 and 2409.47 seconds for LeNet-5 and CIFAR10 models, respectively. The speedup induced by our tool compared to TensorFI ranges from 44.41x to 64.04x times. TensorFI embeds error models that are not validated and are far different from the ones we have observed. The error models embedded in TensorFI are not probabilistic and directly inherited from the fault models used in the architectural fault injection. Therefore, the reliability analysis performed through TensorFI cannot be trusted. The errors observed in the output of the single CNN operator are far complicated than the single bitflip used in the architectural fault injection. Hence, the fault models of the architectural fault injection are not valid for the application level. In conclusion, we have proved that our error models are validated, either by construction and by comparison. Besides that, our error simulator is faster than the current state-of-the-art tools, achieving execution times close to the native executions.

Al giorno d’oggi, si registra un ricorso crescente alle CNN nei sistemi critici in quanto queste raggiungono accuratezze più elevate nei compiti di visione rispetto ai tradizionali algoritmi della CV. Le esecuzioni delle CNN sono eseguite sulle GPU poiché sono capaci di accelerare l’inferenza delle CNN, grazie alla loro architettura parallela. L’accelerazione delle CNN sulle GPU è necessaria per rispettare gli stringenti requisiti imposti dai sistemi critici, soprattutto per quanto riguarda i vincoli temporali. Il binomio costituito dalla CNN eseguita sulla GPU è sempre più presente nei sistemi critici, e, per via della loro natura complessa, è necessario assicurare il corretto funzionamento in ogni situazione possibile, anche di fronte a guasti nei sistemi digitali. L’analisi di affidabilità studia il comportamento dei sistemi in presenza di guasti. L’obbiettivo è determinare se il sistema sia autonomamente in grado di gestire l’occorrenza di guasti, oppure fallisce producendo un risultato errato. Nel nostro contesto, le insidie maggiori sono rappresentate dai guasti originati da fattori e condizioni ambientali, chiamati soft errors. I soft errors non hanno effetti distruttivi o permanenti, ma generano guasti transitori che corrompono lo stato del sistema. Infatti, l’occorrenza di un soft error può commutare il valore di un bit contenuto in una cella di memoria che, qualora sia letto, può produrre un errore. L’attivazione di un soft error, cioè la sua lettura, può indurre l’applicazione a comportarsi in modo diverso da quanto atteso. La necessità è quindi comprendere il comportamento delle CNN quando si attivano i soft errors, quantificandone la deviazione rispetto al funzionamento atteso. Questo è indispensabile poiché i risultati prodotti dalla CNN potrebbero essere usati da sistemi decisionali, le cui scelte hanno impatto sulla sicurezza degli utilizzatori. Tradizionalmente, la letteratura è sempre stata orientata ad eseguire analisi di affidabilità delle CNN eseguite su GPU attraverso tecniche di iniezioni guasti architetturali. Le iniezioni guasti architetturali su GPU emulano l’attivazione dei soft errors all’interno della sua architettura. L’iniezione avviene inserendo bitflip nel percorso dati della GPU, con effetti analoghi a quelli causati dall’evento fisico. Per quanto molto accurata, l’iniezione guasti architetturale impone numerosi vincoli al sistema in oggetto. Le tecniche usate dagli iniettori per emulare i guasti hanno come effetto collaterale quello di rallentare l’esecuzione dell’applicazione. Il rallentamento può portare l’applicazione a non rispettare più i vincoli temporali a cui è soggetto il sistema complessivo. In secondo luogo, l’integrazione dell’iniettore guasti con l’applicazione è complessa poiché questa necessita di essere modificata per permettere all’iniettore di operare. La modifica dell’applicazione può richiedere di modificare il codice sorgente, non sempre attuabile con librerie di codice a sorgente chiuso. In alternativa all’iniezione guasti, l’analisi di affidabilità può essere effettuata tramite simulazione d’errore. La simulazione d’errore è una tecnica che simula gli effetti delle attivazioni dei soft errors direttamente nel codice sorgente dell’applicazione eseguita su GPU. La simulazione avviene corrompendo uno o più dati dell’applicazione secondo dei modelli d’errore. La realizzazione di un simulatore d’errore è molto più semplice rispetto a quella di un iniettore guasti architetturali perché il simulatore può integrarsi direttamente nelle librerie di ML con cui vengono scritte le CNN. Il problema principale legato al simulatore d’errore riguarda la validazione dei suoi modelli d’errore, poiché devono essere in grado di riprodurre gli effetti fisici che si verificano nel dispositivo sottostante. Se i modelli d’errore non sono validati, si corre il rischio di introdurre errori nel sistema che non corrispondono alla realtà, portando ad una analisi incorretta. Nella letteratura, non troviamo riscontri di modelli d’errore validati per le CNN eseguite su GPU dato che la maggior parte dei lavori è focalizzata sull’iniezione guasti architetturali. Lo scopo di questa tesi è definire un framework metodologico per la simulazione d’errore a livello applicativo su una CNN, attraverso modelli d’errore validati. L’obbiettivo del framework è connettere il livello di astrazione della GPU, dove vengono emulati i guasti, a quello della CNN, dove viene analizzato il comportamento in presenza di tali guasti. In primo luogo, abbiamo definito una metodologia per creare modelli d’errore validati sul singolo operatore della CNN. Successivamente, abbiamo eseguito numerose campagne di iniezioni guasti architetturali sui singoli operatori della CNN ottenendo migliaia di risultati corrotti. Questi risultati sono originati dall’attivazione dei soft errors iniettati nelle campagne guasti. I modelli d’errore sono definiti analizzando i risultati corrotti secondo tre parametri: numero e domini dei valori corrotti e motivo spaziale. Il modello così descritto segue un approccio statistico, con cui è possibile ricreare i risultati corrotti osservati secondo le distribuzioni di probabilità di ogni parametro. I modelli d’errore sono perciò validati per costruzione poiché derivati dall’analisi dei risultati corrotti. Nonostante questo, vogliamo offrire un’ulteriore comparazione dei nostri modelli d’errore confrontando gli effetti che essi generano con quelli generati dal migliore iniettore guasti architetturali per GPU. La seconda contribuzione del framework è un approccio per realizzare campagne di simulazione d’errore sulle CNN eseguite su GPU. L’approccio consiste nel sabotare l’uscita di un operatore della CNN, secondo i modelli d’errore definiti sopra. La simulazione d’errore raggiunge un grado di integrazione maggiore con l’applicazione, riuscendo perciò a velocizzare l’esecuzione delle campagne di errore rispetto alle pratiche attuali. Il framework è stato poi implementato, ottenendo una collezione di modelli d’errore e uno strumento di simulazione d’errore. A titolo dimostrativo, abbiamo popolato la collezione con modelli d’errore basati su 11 operatori della CNN, quali la Convolution, Batch Norm, oppure Leaky ReLU. La collezione rimane aperta ad estensioni future per tutti gli ulteriori operatori. Il simulatore d’errori è uno strumento progettato per corrompere l’uscita di un operatore della CNN, inserendo errori secondo i modelli presenti nella collezione. Il simulatore è basato sulla libreria di ML TensorFlow, e integra alcune tecniche avanzate di iniezione, come il check-pointing e l’uso estensivo di cache, che gli assicurano tempi d’esecuzione vicini a quelli nativi. In chiusura, abbiamo comparato il nostro framework a due punti di riferimento del settore su casi d’uso reali. La prima comparazione riguarda SASSIFI che è lo stato dell’arte nel contesto di iniettori guasti architetturali per GPU, sviluppato da NVIDIA. Con SASSIFI abbiamo testato i tempi d’esecuzione nello svolgere la stessa campagna di errori e l’accuratezza dei modelli d’errore. Il soggetto di questo confronto è stato YOLO V3 che è una CNN stato dell’arte nel contesto dell’identificazione di oggetti. Abbiamo simulato col nostro strumento 137000 errori in 15 ore. Per ottenere lo stesso numero di errori con SASSIFI, abbiamo iniettato 360000 guasti poiché la maggior parte di essi non si è attivato. Il tempo totale richiesto da SASSIFI è stato di 92 ore, evidenziando che il nostro simulatore è stato 6.1 volte più veloce nel fare la stessa campagna di errori. Di questi 137000 errori abbiamo analizzato gli effetti nell’uscita di YOLO V3 sia per il nostro simulatore che per SASSIFI, risultando capaci di generare il 98.72% degli effetti di SASSIFI. Il secondo confronto riguarda un emergente simulatore d’errore per CNN chiamato TensorFI. Per via di limitazioni tecniche ed errori progettuali, non è stato possibile utilizzare YOLO V3 con TensorFI. Al suo posto abbiamo impiegato due CNN, LeNet-5 per il MNIST dataset e una implementazione personale per CIFAR10. Entrambe le reti effettuano classificazione d’oggetti in immagini. Abbiamo simulato 10000 errori in entrambe le reti in 24.74 e 37.62 secondi rispettivamente per LeNet-5 e CIFAR10. Le stesse campagne con TensorFI hanno richiesto 1098.71 e 2409.47 secondi rispettivamente per LeNet-5 e CIFAR10. TensorFI integra modelli d’errore che non sono probabilistici, importandoli direttamente dai modelli di guasto usati nell’iniezione architetturale. Perciò, le analisi di affidabilità attraverso TensorFI non sono verosimili poiché gli errori osservati nell’uscita di un singolo operatore della CNN sono molto più complessi dei bitflip usati a livello architetturale. Quindi, i modelli di guasto architetturali non sono validi per l’applicazione. In conclusione, abbiamo dimostrato che i nostri modelli d’errore sono validati sia per costruzione che per confronto. Inoltre, il nostro simulatore è più veloce degli strumenti che attualmente rappresentano lo stato dell’arte, ottenendo tempi di esecuzione molto prossimi a quelli nativi.