A mixed-signal integrated circuit based on phase change memory synpases for deep neural accelerators

Nowadays, machine learning empowers several different consumer applications, where artificial intelligence is well supported by statistical models and mathematical algorithms that allow computer systems to accurately perform specific tasks. Deep neural networks (DNNs) have dramatically enhanced classification and recognition operations by exploiting a general-purpose learning procedure in a multi-layer architecture. However, these architectures have several limitations. First, the training and inference of DNNs using standard digital systems is time expensive and power consuming. Secondly, trained DNN cannot adapt to a constantly changing environment. For instance, biological organisms steadily acquire and modulate knowledge with respect to the environment in which they live (lifelong learning), while DNNs are affected by catastrophic forgetting whenever new data is learnt. For these limitations, the scientific community and the industry are looking for novel methods to improve the performance and efficiency of DNNs. Taking advantage from the latest advances related to innovative computing approaches such as in-memory computing, the training and testing of DNNs could be highly improved in terms of speed and power efficiency. In-memory computing rises as a very effective method for overcoming the limitations of typical von Neumann architectures since it massively parallelizes the operations and performs the calculations where the data are stored, avoiding the so called ``von Neumann bottleneck”. In particular, in-memory computing requires the use of memory elements capable of storing data and performing calculations at the same time. New emerging non-volatile memories (NVM), such as phase change memory (PCM) or resistive switching RAM (RRAM) give solution to these requirements, as they have small size and show fast switching, multilevel capability, low-voltage operation, and time-dependent dynamics. In addition, they can be arranged in array architectures. As the main operations in DNNs are related to dense matrix-matrix multiplications, trained weights can be mapped in NVMs crossbar arrays as conductance values, exploiting the Ohm's and Kirchhoff's laws for performing matrix-vector-multiplication (MVM). These new advances could outperform current GPUs and CPUs in terms of power consumption and speed since the use of MVM allows to perform multiply-and-accumulate (MAC) operations in just one step. This doctoral dissertation aims to the improvement of DNNs from a technical point of view following two different research paths. The first regards the introduction of bio-inspired methods into the general architecture of neural networks for improving the capability of DNNs in the recognition of unknown images. Spike-Timing-Dependent plasticity (STDP), brain-inspired homeostasis and neural redundancy are some of the elements that have been included in the network to stabilize the learning processes. The second research line covers the hardware design of a mixed-signal integrated circuit based on PCM synapses for the development of deep neural accelerators. Following the in-memory computing hardware approach, DNN weights are mapped in NVM arrays. A generic 1-layer fully connected (FC) multi-layer perceptron (MLP) is proposed, where the training weights are mapped into 4-bit unsigned digital words of “0s” and “1s”, taking advantage from the wide resistive window of PCM devices between the high resistive state (HRS) and low resistive state (LRS), respectively. The circuit has been designed using the ST Microelectronics BJT-CMOS-DMOS (BCD) 90 nm design kit with an embedded 1T1R PCM cell, which has been organized forming several arrays of 5Kb. The memory cells are manufactured with an optimized Ge-rich chalcogenide alloy and are stacked over the CMOS circuitry in the back-end of the line. The design faces several circuital challenges such as the implementation of the analogue-to-digital interface between the array and the input-output (IO) peripherals, and the signal processing for driving the PCM devices. The circuit performs intelligent tasks of recognition of handwritten digits (MNIST) at high bandwidth (500 kHz) and low power (~ 200 mW). All the MNIST inference activity is performed in less than 0.8 s (256 mega operations per second, MOPS), which is far less with respect to the state of the art of standard von Neumann processors. Furthermore, the whole chip relies on a significant robustness with respect to the non-idealities of PCM devices, since the results are resilient both to drift and resistance variability, achieving almost the same software classification accuracy of the MNIST dataset (~ 85%). This work highlights the main features, problems, and design requirements for efficiently implementing a hardware integrated DNN using PCM cells. The adopted solutions and the obtained results are extensively described by pointing out the advantages of analogue in-memory computing for the realization of arithmetic calculations. In the following, a summary of the main sections of this doctoral dissertation is proposed. Chapter 1 gives a short overview about the current learning and computing methods exploited in the artificial intelligence, mainly focusing on the so called ``in-memory computing", describing its advantages and the theoretical hardware implementation. Chapter 2 focuses on the description of emerging non-volatile memories as phase change memory (PCM), and filamentary and non-filamentary resistive random switching memory (RRAM). This chapter explains the main physical properties and suitability of these devices for implementing synaptic elements in neuromorphic engineering. Chapter 3 presents the digital development of a hybrid supervised - unsupervised neural network using the Xilinx Zynq-7000 System-On-Chip (SoC) capable of performing lifelong learning. The supervised part is formed by a convolutional neural network that allows the extraction of generic features from a training dataset; the unsupervised section is constituted by a spiking winner-take-all (WTA) network that follows the STDP protocol. The inference results are validated for the correct classification of up to 5 non-trained classes of the MNIST and fashion-MNIST datasets and compared with PCM-based approaches. Chapter 4 covers the design of a new kind of hardware based on SiO$_X$ RRAM devices that supplements the accuracy of convolutional neural networks with the flexibility of bio-inspired spike-timing dependent plasticity. To enable the cohesion between the stable and the plastic section of the network, the bio-inspired spike-frequency adaptation of the neurons is exploited, as it enhances the efficiency and accuracy of the network. Chapters 5, 6, 7 and 8 constitute the core of this doctoral dissertation, since they deal with the integrated design of DNNs Chapter 5 introduces the main characteristics and application requirements of the 1-layer MLP integrated design. Based on experimental measurements of the PCM 1T1R cell, Monte Carlo simulations have been performed to validate the hardware architecture. The obtained results support the choice of the circuital implementation and the methodology to follow for the mapping of weights. Chapter 6 describes the circuital design, the simulated results, and the physical realization (layout) of the analogue front-end of the circuit, formed by a column decoder, an amplification stage, an output analogue buffer and a voltage-dependent current generator based on a band-gap circuit. Chapter 7 explains the design a Successive Approximation Register (SAR) analogue-to-digital (ADC) converter, used for the ADC conversion of the analogue signal (voltage) obtained from the MVM operation. Chapter 8 covers the design of the digital part of the circuit, which is formed by a serial-to-parallel (SPI) interface and a digital signal processor (DSP) that manages the readout of the input digital signals (e. g. input image of the MNIST dataset) and processes the internal ADC conversions to obtain the output classification neuron. Finally, Chapter 9 proposes an insight over the state of the art of bio-inspired computation, explaining the design of a homeostatic neuron based on the gradual crystallization of a PCM device. This latest chapter has been included as a final appendix. The presented research activity offers a complete insight about the current panorama of neuromorphic engineering. This doctoral dissertation introduces some novel studies that could induce significant breakthroughs in the design of hardware integrated neural accelerators with emerging memory devices.

L’intelligenza artificiale ricopre una vastità notevole di applicazioni tramite l’utilizzo di algoritmi matematici ed accurati modelli statistici che permettono un’efficienza a tratti migliore di quella umana. In particolare, le reti neurali profonde hanno recentemente dimostrato capacità notevoli in applicazioni volte al riconoscimento di immagini, alla comprensione vocale di frasi e in alcuni giochi di logica come il Go. Tuttavia, le reti neurali volte al raggiungimento di tali obiettivi presentano notevoli limiti. In primo luogo, la maggior parte di queste reti devono essere allenate tramite procedure dispendiose in termini di tempo ed energia. In seconda istanza, tali reti sono scarsamente resilienti e non possono adattarsi facilmente a compiti diversi, pur restando nello stesso ambito semantico. Prendendo spunto dalle più recenti scoperte ed invenzioni tecnologiche, scopo di questa tesi è la realizzazione di hardware capaci di eliminare alcune delle limitazioni fondamentali dei sistemi intelligenti artificiali. Tale obiettivo è perseguito, in particolare, facendo riferimento alla computazione “in memoria” tramite l’utilizzo di dispositivi non volatili che possono essere impilati direttamente sopra l’unità computazionale (il processore). In questo modo viene completamente azzerato il tempo di comunicazione tra processore e memoria convenzionale tipico delle tecniche di computazione correnti, problema noto come “collo di bottiglia Von Neumann”. Questa tesi dottorale approfondisce soprattutto due dispositivi di memoria non volatile: la memoria a filamento resistivo (RRAM) e la memoria a cambiamento di fase (PCM). Particolare attenzione viene soprattutto data alla PCM, che per finestra resistiva e capacità di sostenere una programmazione multi-bit pare essere uno dei dispositivi più promettenti come candidato alla rivoluzione tecnologica del futuro. Il lavoro di ricerca descritto affronta tali tematiche tramite due percorsi fondamentali. Il primo riguarda la riproduzione in hardware di sistemi intelligenti artificiali capaci di sorpassare alcune problematiche storiche come l’interferenza catastrofica: tale aspetto, al fine di individuare ed analizzare i benefici delle nuove tecnologie di memoria, viene analizzato tramite una comparazione di applicazione completamente basata su FPGA con un’altra basata su tecnologia PCM. Si approfondisce inoltre una rete basata su dispositivi RRAM capace di migliorare lo stato dell’arte con riferimento ai concetti di accuratezza e plasticità sinaptica. Il secondo percorso, invece, riguarda la progettazione integrata in Cadence Virtuoso di una rete neurale utilizzando dispositivi PCM. Lo studio, in questo caso, riguarda il disegno di un circuito misto analogico-digitale che possa effettuare il riconoscimento di 10000 numeri dell’MNIST in meno di 0.8 secondi (256 mega operazioni per secondo - MOPS) sfruttando la possibilità di organizzare i dispositivi PCM in crossbar. La disposizione parallela in crossbar, in particolare, permette la realizzazione di prodotti matrice-vettore molto più veloci ed efficienti con rispetto alle tecnologie attuali. La progettazione dell’acceleratore neurale è perseguita usando il pacchetto di disegno circuitale STM BCD10 con PCM integrate ad un nodo tecnologico di 90 nm. Il disegno riguarda tutti gli aspetti e le problematiche attuali di gestione dei dispositivi di memoria non volatile, come: l’implementazione di interfacce analogico-digitali; la gestione dell’elettronica d’ingresso e di uscita per un corretto trasferimento seriale e parallelo dell’informazione; la progettazione di un sistema digitale co-integrato nel chip (DSP) capace di organizzare e gestire tutti i segnali più significativi. Il circuito riconosce immagini lavorando ad una frequenza di 500 kHz e a basso consumo di potenza (200mW), offrendo risultati significativi sia in termini puramente tecnologici che applicativi. Questa tesi dottorale presenta inoltre un algoritmo circuitale per la definizione di reti neurali integrate capaci di risolvere alcune limitazioni fondamentali dei dispositivi attuali. In particolare, le PCM soffrono della cosiddetta “deriva di conduttanza”, un innalzamento del valore resistivo del dispositivo in funzione del tempo e dello stato iniziale. Tale fenomeno fisico è pericoloso per l’accuratezza delle reti neurali artificiali che si basano su una corretta definizione dei pesi sinaptici della rete stessa. Un algoritmo basato sull’utilizzo di tecniche di compensazione a memristore, capace di recuperare completamente l’accuratezza della rete, è qui presentato in dettaglio sia per un’implementazione dei pesi a multilivello sia per un’implementazione dei pesi digitale. Questa tesi dottorale offre un panorama completo in riferimento all’ingegneria di reti neuromorfiche e neurali, con la speranza che le numerose novità qui introdotte possano contribuire alla nascita di sistemi artificiali intelligenti capaci di modulare lo sviluppo tecnologico del futuro.