### Politecnico di Milano

Department of Electronics, Information and Biomedical Engineering Doctoral Programme in Information Technology



## ULTRA LOW-POWER ANALOG AND MIXED-SIGNAL SoCs FOR SMART SENSORS APPLICATIONS

PhD Dissertation of : Stefano Brenna

Advisor : Prof. Andrea L. Lacaita

Co-Advisor : Prof. Andrea Bonfanti

Tutor : Prof. Carlo Fiorini

The Chair of the Doctoral Programme : Prof. Carlo Fiorini

2015 - XXVIII Cycle

Alla mia famiglia.

### Acknowledgements

I would like to thank a long list of people. Without some of them this work would have not been possible, without some others it would have been much harder. Just in alphabetical order, and by first name:

Alessandro Spinelli, Andrea Abba, Andrea Bonetti, Angelo Geraci, Antonio Longoni, Carlo Samori, Carlo Fiorini, Davide Bignardi, Fabio Di Cerbo, Erkan Alpman, Francesco Caponio, Francesco Lazzarini, Giovanni Paolucci, Giovanni Marzin, Giacomo Langfelder, Giulio Roncolato, Isabella Cocco (special thanks), Katia De Gregorio (special thanks), Luca Bettini, Marco Allegretta (special thanks), Marco Angiolini, Mariano Da Rold, Mario Laudato, Michele Suraci, Nicola Ciocchini, Paolo Maffezzoni, Paolo Minotti, Riccardo Villa, Roberto Modaffari, Salvatore Levantino, Simone Balatti, Simone Pilli, Stefano Pellerano.

I greet my adivsor, Prof. Andrea L. Lacaita. His smart insights, wise guidance and his inner capability of pushing people towards increasingly challenging goals have

been strong sources of energy and motivation during the hardest days of my PhD. Still they are. Finally, but first for the time spent and the efforts made in teaching me what an IC designer really is, I want to express my deep gratitude to Prof. Andrea G. Bonfanti.

Everything I know about my job, I learnt it from them.

And, obviously, special thanks go to Andrea Fenaroli and Giovanni Marucci.

"... and way to go."

### Abstract

The relentless miniaturization of microelectronic technologies is leading to drastically reduce power consumption thus making possible the design of sensor nodes for distributed sensing or the integration of multiple sensing systems in portable, consumer electronic devices. In these systems low-power low-noise front-ends have to acquire, digitize and transmit information. To improve energy efficiency design solutions should be found to keep power consumption as low as possible and improving the efficiency of all the circuit blocks. A/D converters with energy efficiency better than 20 fJ/conversion step must be investigated as well as high efficiency transceivers for short range radio links. To meet the recent trends of consumer and in general portable electronic applications, three different efficiency oriented designs were presented in this work. Two earth magnetic field sensing systems to support the development of indoor navigation systems and a multichannel wireless neural probing systems with state-of the-art efficiency. In particular, both a 3-axes Lorentz force based and a 3-axes AMR magnetic field sensing readout integrated circuits were designed in CMOS  $0.35\mu$ m to provide the signal amplification and digitalization. The Lorentz force based sensing system is the first presented in literature with an integrated readout electronics, works with a 3V supply, achieves a resolution of 28mGa and a programmable full-scale-range up to 24Ga with less than 1mW power consumption per channel. Thanks to better sensor characteristics, the AMR sensing system is designed to achieve 4mGa resolution, a full scale range of 10Ga drawing only  $180\mu W$  per-axis from a 1.8V power supply and it is currently under measurements.

The neural probing system is fabricated in  $0.13\mu$ m process, features 64 channels, each comprising a low-noise amplifiers and an 10bit 6fJ/cstep efficiency ADC. The systems is provided with an UWB wireless link able to transmit a 20mbps bit stream to a 7m far receiver. The overall system power consumption is equal to  $965\mu W$  from a 0.5V supply, it is the lowest among multichannel (>32) systems and it is achieved with the widest transmission range.

### Sommario

Il ritmo di miniaturizzazione impresso dal progresso della tecnologia microelettronica sta portando ad una drastica riduzione del consumo di energia da parte dei circuiti, rendendo possibile il progetto di sistemi integrati multi-sensore compatibili con l'impiego in dispositivi di consumo come smartphones e tablet. In questi sistemi, fron-end a basso rumore e basso consumo di potenza devono acquisire, digitalizzare e trasmettere l'informazione proveniente da ciascun sensore. Al fine di migliorarne l'efficienza, sono spesso richieste nuove soluzioni progettuali volte a ottimizzare sia l'architettura di sistema, che le prestazioni dei singoli blocchi. Convertitori analogico-digitali con efficienze migliori di 20fJ/conversion-step e trasmettitori a basso consumo e corto raggio sono solo alcuni dei circuiti richiesti per questo tipo di applicazioni. In questo lavoro sono presentati tre differenti progetti per la realizzazione di sistemi di sensori compatibili con le piú recenti specifiche di efficienza richieste dall'elettronica di consumo ed in particolare dei dispositivi portatili. In particolare, é discusso il progetto di due sistemi di lettura del campo magnetico terrestre utili all'integrazione con sensori di movimento al fine di permettere migliori funzionalitá di rilevamento spaziale come la navigazione indoor. In aggiunta, é illustrato anche il progetto di un sistema di rilevazione e trasmissione wireless del segnale neuronale ad alta efficienza per esperimenti di neuroscienze. Entrambi i front-end per il sensing del campo magnetico sono parti di sistemi a 3-assi x-y-z e sono progettati in tecnologia CMOS  $0.35\mu$ m ma si interfacciano con sensori di natura differente. Il primo é basato sulla Forza di Lorentz, il secondo sfrutta le proprietá anisotropiche magneto-resistive (AMR) di specifici materiali (alloys). Il sistema di sensing Lorentz sfrutta la recente tecnologia MEMS ed é il primo esempio presentato in letteratura di sistema comprendente sensore ed elettronica di lettura integrata. E alimentato a 3V, offre una risoluzione di 28mGa ed un Full-Scale Range programmabile fino a 24Ga per un consumo inferiore ad 1mW per canale. Grazie ad una tecnologia di produzione del sensore piá consolidata, il sistema AMR é progettato per garantire fino a 4mGa di risoluzione, su un full-scale di 10Ga, assorbendo soltanto  $180\mu W$  per-axis da una alimentazone di 1.8V e le sue prestazioni sono attualmente in fase di misura.

La sonda neuronale, infine, é fabbricata in tecnologia CMOS  $0.13\mu$ m, dispone di 64 canali di acquisizione ciascuno comprendente un amplificatore a basso rumore ed un convertitore analogico digitale con 6fJ/cstep di efficienza. La trasmissione wireless é realizzata grazie ad un link UWB in grado di sostenere un flusso di dati fino a 20Mbps ed un range di trasmissione fino a 7m. Con un consumo inferiore a 965 $\mu$ W ed una alimentazione di 0.5V, é il piú efficiente sistema multicanale (>32) per esperimenti neuronali su animali di laboratorio ed ottiene il maggior range di trasmissione tra quelli finora presentati in letteratura.

## Contents

| 1        | Intr | oduction 1                                         |
|----------|------|----------------------------------------------------|
|          | 1.1  | Sensing the environment                            |
|          | 1.2  | Front-end design optimization                      |
|          |      | 1.2.1 Analog signal conditioning                   |
|          |      | 1.2.2 Analog to digital conversion                 |
|          | 1.3  | Presented work                                     |
|          |      | 1.3.1 Lorentz Force Magnetometer                   |
|          |      | 1.3.2 AMR Magnetometer                             |
|          |      | 1.3.3 Wireless neural probing system               |
| <b>2</b> | Lore | entz Force Magnetometer 8                          |
|          | 2.1  | Magnetometer design                                |
|          |      | 2.1.1 Mechanical simulations and design            |
|          |      | 2.1.2 Electrical simulations and design            |
|          |      | 2.1.3 Device packaging and overall dimensions      |
|          |      | 2.1.4 Prediction and validation of the sensitivity |
|          |      | 2.1.5 Prediction of thermo-mechanical noise        |
|          | 2.2  | Front-end electronics                              |
|          |      | 2.2.1 Amplification stages                         |
|          |      | 2.2.2 Noise analysis                               |
|          |      | 2.2.3 Downconversion and filtering                 |
|          | 2.3  | Measurements results                               |
|          |      | 2.3.1 Sensitivity and bandwidth                    |
|          |      | 2.3.2 System resolution and power consumption      |
|          |      | 2.3.3 Perspective for driving circuit integration  |
|          | 2.4  | Conclusions and perspectives                       |
| 3        | AM   | R magntometer 33                                   |
|          | 3.1  | Motivation                                         |
|          | 3.2  | System requirements                                |
|          | 3.3  | The sensor                                         |
|          |      | 3.3.1 Operating principle                          |
|          |      | 3.3.2 Implementation and features                  |
|          |      | 3.3.3 Integration                                  |
|          | 3.4  | System overview                                    |
|          |      | 3.4.1 Fundamental considerations                   |

|   |     | 3.4.2 Architecture                                               |
|---|-----|------------------------------------------------------------------|
|   |     | 3.4.3 Set-Reset feature                                          |
|   | 3.5 | Mixed-signal Front-End                                           |
|   |     | 3.5.1 Amplifier                                                  |
|   |     | 3.5.2 SAR ADC                                                    |
|   | 3.6 | Simulation results                                               |
|   |     | 3.6.1 Analog front-end                                           |
|   |     | 3.6.2 ADC                                                        |
|   |     | 3.6.3 Power anatomy                                              |
|   | 3.7 | Conclusions                                                      |
| 4 | Neu | ral probing System-on-Chip 56                                    |
|   | 4.1 | System Architecture and Circuit Implementation                   |
|   |     | 4.1.1 Analog-front end                                           |
|   | 4.2 | The charge redistribution SAR ADC analysis and optimization      |
|   |     | 4.2.1 Comparison between CBW and BWA topologies                  |
|   |     | 4.2.2 Circuit design                                             |
|   |     | 4.2.3 Measurement Results                                        |
|   |     | 4.2.4 Conclusions                                                |
|   |     | 4.2.5 UWB transmitter                                            |
|   |     | 4.2.6 UWB receiver                                               |
|   | 4.3 | Experimental Results                                             |
|   | 4.4 | Conclusions                                                      |
| 5 | Oth | er Activities 83                                                 |
|   | 5.1 | Introduction                                                     |
|   | 5.2 | A tool for the assisted design of charge redistribution SAR ADCs |
|   | 5.3 | Converter Topologies                                             |
|   |     | 5.3.1 Classic Binary Weighted Array (CBW)                        |
|   |     | 5.3.2 Split Binary Weighted Array (SBW)                          |
|   |     | 5.3.3 Binary Weighted with Attenuation Capacitor (BWA)           |
|   | 5.4 | Tool working principle                                           |
|   | 5.5 | Capacitive array model                                           |
|   |     | 5.5.1 CBW Model                                                  |
|   |     | 5.5.2 SBW Model                                                  |
|   |     | 5.5.3 BWA Model                                                  |
|   | 5.6 | Switching energy computation                                     |
|   | 5.7 | Design Flow                                                      |
|   | 5.8 | Simulation and Measurements Results                              |
|   | 0.0 | 5.8.1 Static Metrics 98                                          |
|   |     | 5.8.2 Dynamic Metrics                                            |
|   |     | 5.8.3 Switching Energy                                           |
|   |     | 5.8.4 Simulation Time $103$                                      |
|   | 5.9 | Conclusions 104                                                  |
|   | 0.0 |                                                                  |

# List of Figures

| 1.1  | Efficiency chart of SAR and SDM implementationas a function of the resolution<br>and taken from the last 10 years of VLSI and ISSCC presented works. The dotted<br>line represent the efficiency limit imposed by noise contraints for a generical<br>Nyquist rate converter [1]. | 4   |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 2.1  | SEM photograph of the device showing the diamond-shaped tuning fork, the current recirculation concept, implemented through 10 metal coils deposited over                                                                                                                         |     |
| 2.2  | the springs, and the parallel plate cells, used for the capacitive readout and tuning.<br>Results of FEM simulations for (a) the anti-phase mode, excited by the Lorentz<br>current flowing in opposite directions through the springs, and for (b) the in-                       | 9   |
|      | fork geometry, coupled to the holding bars, helps in shifting the in-phase mode                                                                                                                                                                                                   |     |
| 0.0  | to frequencies higher than for the anti-phase mode.                                                                                                                                                                                                                               | 11  |
| 2.0  | (x-axis left end) to the output (x-axis right end). The curves refer to different<br>metal widths and cross sections or to different link geometries. The partial current                                                                                                         |     |
|      | loss in the first loop decreases if the resistance of the Al path is decreased (wide                                                                                                                                                                                              |     |
|      | metal), or when the links resistance increases (serpentines).                                                                                                                                                                                                                     | 12  |
| 2.4  | Details of a SEM photograph showing the metal-over-poly structure at one spring                                                                                                                                                                                                   | 19  |
| 2.5  | Mechanical sensitivity of the magnetometer. The blue squares correspond to                                                                                                                                                                                                        | 15  |
|      | measurements, the green dotted line is the best linear fitting. The inset shows the linear fitting $(12.4 \text{ mT})$                                                                                                                                                            | 15  |
| 2.6  | Block diagram of the sensing readout circuit (a) and a simplified single-ended                                                                                                                                                                                                    | 19  |
|      | scheme of the amplifier (b).                                                                                                                                                                                                                                                      | 16  |
| 2.7  | Schematic representation of the driving and the readout circuits.                                                                                                                                                                                                                 | 17  |
| 2.8  | Single-ended electric simplified scheme of the amplifying change highlighting the                                                                                                                                                                                                 |     |
|      | internal two-stage architecture of the OTAs of the two amplifiers                                                                                                                                                                                                                 | 18  |
| 2.9  | Transistor-level implementation of OTA1, with its common-mode feedback network.                                                                                                                                                                                                   | 20  |
| 2.10 | Transistor-level implementation of OTA2, adopted for the capacitive amplifier,                                                                                                                                                                                                    | 0.1 |
| 2.11 | with its common-mode feedback network                                                                                                                                                                                                                                             | 21  |
|      | OTAs (b)                                                                                                                                                                                                                                                                          | 23  |
| 2.12 | ASIC die microphotograph with the highlighted circuit blocks. Note the reduced                                                                                                                                                                                                    | -   |
|      | PAD area to minimize parasitic capacitances.                                                                                                                                                                                                                                      | 24  |
| 2.13 | Photography showing the wire boding of the stacked MEMS and ASIC dies                                                                                                                                                                                                             | 25  |

| 2.14           | System sensitivities, evaluated for a 200 Hz mismatch, for two values of the driving current.                                                                                                                                          | 26       |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 2.15           | Input-referred capacitive noise spectral density of the readout electronics as a function of the analog front and power consumption                                                                                                    | 97       |
| 2.16           | Equivalent capacitive noise spectral density of the system measured at different frequency mismatches                                                                                                                                  | 21       |
| 2.17           | Input-referred magnetic field noise spectral density of the system measured at different frequency mismatches                                                                                                                          | 20       |
| 2.18           | System noise performance, displayed as input-referred Allan standard deviation,<br>evaluated at 200 Hz mismatch, using a $107 \ \mu \Lambda$ driving current                                                                           | 29<br>29 |
| 2.19           | Transistor level view of the Pierce oscillator. Coupled to a Tang resonator, the circuit implements the driving stage at the oscillation frequency of the resonator,                                                                   | 32       |
|                | delivering the desired AC current through a resistive load                                                                                                                                                                             | 32       |
| 3.1            | Simplified scheme of the magnetoresistive property: The resistance value depends<br>on the angle $\theta$ between the current flow direction and the material magnetization<br>vector, which depends on the surrounding magnetic field | 22       |
| 3.2            | Scheme of a Barber pole-based AMR element and its resistive characteristic as a function of the angle between the magnetization vector and the average direction                                                                       | 00       |
| 3.3            | AMR wheatstone bridge with Set-Reset coils (a) and an example of Set-Reset                                                                                                                                                             | 35       |
| 3.4            | offset cancellation technique scheme (b)                                                                                                                                                                                               | 36<br>37 |
| 3.5            | Simplified scheme of signal conditioning chain devoted to the readout of a single-                                                                                                                                                     | 07       |
| 3.6            | Fundamental Wheatstone bridge readout scheme in which the only branch is sensed by means of a common source mirrored amplifier.                                                                                                        | 37<br>38 |
| 3.7<br>3.8     | AMR readout ASIC architecture                                                                                                                                                                                                          | 40       |
| 39             | sensed axes                                                                                                                                                                                                                            | 41<br>42 |
| 3.10<br>3.11   | Timing diagram of the AMR analog readout control signals                                                                                                                                                                               | 42       |
| 0.10           | digital conversion                                                                                                                                                                                                                     | 43       |
| 3.12           | (which is also the S&H), the comprator and the logic.                                                                                                                                                                                  | 44       |
| $3.13 \\ 3.14$ | 14-bit CBW a) and BWA b) DAC array topology                                                                                                                                                                                            | 45<br>46 |
| 3.15           | Section showing the custom unit elements implementation and a placement and                                                                                                                                                            | 477      |
| 3.16           | The shield reduces the top-to-bottom parasitics.                                                                                                                                                                                       | 47<br>48 |
| 3.17           | Detail of the DNL periodic peaks (a) and the related non-linearity of the I/O characteristic (b) due to $C_{par,sub}$ of 480fF, with a $C_u$ equal to 250fF.                                                                           | 48       |
| 3.18           | BWA array provided with calibration modules (a) and details of their implemen-<br>tation (b)                                                                                                                                           | 48       |

| 3.19 | Calibration modules control circuit (a) and its working principle (b)                                      | 49 |
|------|------------------------------------------------------------------------------------------------------------|----|
| 3.20 | Details of the implementation of the 14-bit SAR logic and of its connection to                             |    |
|      | the DAC and the calibration bits.                                                                          | 49 |
| 3.21 | Temporizer and the timing diagram of its related signals.                                                  | 50 |
| 3.22 | Two stage comparator composed by a preamplifier and a latch.                                               | 51 |
| 3.23 | Transistor level implementation of the comparator.                                                         | 51 |
| 3.24 | Front-end output noise with and without correlated double sampling technique.                              | 52 |
| 3.25 | DNL and INL (a) and a related detail of the I/O characteristic (b) of the ADC with and without calibration | 53 |
| 3 26 | Statistical distribution of the effective number of bit due to parasitic and techno-                       | 00 |
| 0.20 | logical mismatch affecting the $DAC$                                                                       | 53 |
| 3.27 | Simulated ADC power consumption at different sampling frequencies                                          | 54 |
| 3.28 | System power anatomy                                                                                       | 55 |
| 0.20 |                                                                                                            | 00 |
| 4.1  | Block scheme of the fully-integrated wireless neural recording SoC.                                        | 57 |
| 4.2  | Detailed schematic of the recording channel with the 2-stage amplifier and the                             |    |
|      | 10-bit binary-weighted with attenuation capacitor SAR ADC.                                                 | 58 |
| 4.3  | Schematic of the proposed 10-bit converter.                                                                | 59 |
| 4.4  | Schematic of a N-bit CBW (a) and of a $(m + l)$ -bit BWA (b) capacitive DAC.                               |    |
|      | Also the stray capacitances affecting the arrays are represented.                                          | 60 |
| 4.5  | Average (a) and standard deviation (b) of ENOB as function of $\sigma_{DNL,max}$ for a                     |    |
|      | 10-bit CBW and BWA charge redistribution capacitive DAC.                                                   | 62 |
| 4.6  | Simulated DNL (a) and INL (b) for a 10-bit BWA DAC featuring $C_u$ =100 fF and                             |    |
|      | $C_{par.sub}=50$ fF.                                                                                       | 65 |
| 4.7  | Switching energy versus output code.                                                                       | 67 |
| 4.8  | Adopted layout scheme for the capacitive DAC of one branch (D stands for                                   |    |
|      | dummy element).                                                                                            | 68 |
| 4.9  | Schematic of the dynamic comparator.                                                                       | 69 |
| 4.10 | Deterministic effect of the comparator input capacitance on the INL curve                                  | 70 |
| 4.11 | Effect of the comparator on the INL curve considering a mismatch of the aspect                             |    |
|      | ratio between the input transistors                                                                        | 70 |
| 4.12 | Logic temporizer.                                                                                          | 71 |
| 4.13 | Timing diagram.                                                                                            | 72 |
| 4.14 | Schematic of the asynchronous logic with the details of the dynamic differential                           |    |
|      | latch (DDL) and the dynamic flip-flop (DFF).                                                               | 72 |
| 4.15 | Die photo of the ADC.                                                                                      | 74 |
| 4.16 | Measured DNL and INL at 0.5-V supply-voltage.                                                              | 75 |
| 4.17 | Measured spectrum with an input sine-wave at $5.13$ kHz and $96.48$ kHz for 200-                           |    |
|      | kSps sampling frequency and 0.5-V supply                                                                   | 76 |
| 4.18 | Measured power consumption for different supply voltages                                                   | 76 |
| 4.19 | Measured FOM of state-of-the-art SAR ADCs.                                                                 | 78 |
| 4.20 | Simplified schematic of the DCO with the transformer driving the antenna (a)                               |    |
|      | and measured pulse waveform (b).                                                                           | 78 |
| 4.21 | Simplified schematic of the charge-pump circuit                                                            | 79 |
| 4.22 | Measured frequency response (a) and input-referred noise (b) for the channel                               |    |
|      | amplifier.                                                                                                 | 80 |

| 4.23 | ADC output spectrum (a) and static non-linearity (b)                                                                                         | 80  |
|------|----------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.24 | Measured BER vs distance curve                                                                                                               | 81  |
| 4.25 | (a) Neural trace transmitted by the wireless link and (b) comparison between                                                                 |     |
|      | original and reconstructed spike.                                                                                                            | 81  |
| 5.1  | Generic SAR ADC architecture with a capacitive DAC in the feedback path.                                                                     | 84  |
| 5.2  | Schematic of a N-bit CBW array.                                                                                                              | 85  |
| 5.3  | Schematic of a N-bit SBW array.                                                                                                              | 86  |
| 5.4  | Schematic of a N-bit BWA array.                                                                                                              | 88  |
| 5.5  | CSAtool block diagram.                                                                                                                       | 88  |
| 5.6  | Conversion characteristic for a 3-bit single-ended AD converter. The analog input transition levels are set by the DAC output.               | 90  |
| 5.7  | $4^{th}$ bit evaluation step of a 6-bit CBW converter. The capacitance $C_4$ is switched                                                     | 00  |
|      | to $V_{DD}$ .                                                                                                                                | 92  |
| 5.8  | Typical SAR ADC design flow with a comparison between traditional approach                                                                   |     |
|      | and CSAtool performance.                                                                                                                     | 93  |
| 5.9  | Schematics of the a) static and b) dynamic performance evaluation with a tradi-                                                              |     |
|      | tional Spice-like simulator.                                                                                                                 | 94  |
| 5.10 | Die photograph of the two measured prototypes adopting a) an 8-bit single-ended                                                              |     |
|      | CBW and b) a 10-bit fully-differential BWA DAC.                                                                                              | 95  |
| 5.11 | Layout of the DAC of the prototyped SBW charge redistribution converter with                                                                 |     |
|      | the detail of the connections between the adopted PiP capacitors                                                                             | 95  |
| 5.12 | Comparison between DNL and INLof the 10-bit SBW ADC prototype estimated                                                                      |     |
|      | by Cadence Spectre simulations (black lines) and by CSAtool (red lines)                                                                      | 96  |
| 5.13 | Comparison between DNL and INL characteristics of the 8-bit CBW ADC pro-                                                                     |     |
|      | totype estimated by Cadence Spectre simulations (black lines) and by CSAtool                                                                 |     |
|      | (red lines)                                                                                                                                  | 96  |
| 5.14 | Measured DNL and INL of the fabricated 8-bit CBW ADC prototype                                                                               | 97  |
| 5.15 | Comparison between DNL and INL characteristics of the 10-bit BWA ADC pro-                                                                    |     |
|      | totype estimated by Cadence Spectre simulations (black lines) and by CSAtool                                                                 | . – |
|      | (red lines).                                                                                                                                 | 97  |
| 5.16 | Effect of floating dummy capacitors on the 10-bit BWA array.                                                                                 | 98  |
| 5.17 | Measured DNL and INL of the fabricated 10-bit BWA ADC prototype.                                                                             | 99  |
| 5.18 | Standard deviation of DNL and INL as a function of the output code for the                                                                   |     |
|      | a) 8- and the b) 10-bit SBW and c) BWA converters considering the technology                                                                 | 00  |
| F 10 | Capacitive mismatch.                                                                                                                         | 99  |
| 9.19 | Simulated SNDR as a function of the input signal amplitude for the a) 8-bit $(DW, h)$ the 10 hit CDW and a) 10 hit DWA desired concentration | 101 |
| 5 20 | (BW, b) the 10-bit SBW and c) 10-bit BWA designed converters.                                                                                | 101 |
| 0.20 | Average switching energy for the three converter topologies. Dashed lines refer                                                              | 109 |
| 5 91 | Switching energy as a function of the output code and normalized to the unit                                                                 | 102 |
| 0.41 | element for a fully-differential BWA ADC switched with the a) traditional and b)                                                             |     |
|      | the monotonic algorithm Black lines refer to CSA tool results while dashed lines                                                             |     |
|      | show Cadence simulations                                                                                                                     | 102 |
| 5.22 | Screenshot of the CSAtool graphic user interface.                                                                                            | 104 |
|      |                                                                                                                                              |     |

# List of Tables

| 2.1 | Device parameters                                                                               | 13  |
|-----|-------------------------------------------------------------------------------------------------|-----|
| 2   | Comparison of the presented magnetic field sensing system performance with the state-of-the-art | 30  |
|     | State-or-ine-art                                                                                | 00  |
| 3.1 | Design requirements                                                                             | 35  |
| 4.1 | Comparison of the CBW and BWA arrays performance                                                | 62  |
| 4.2 | Capacitance comparison                                                                          | 63  |
| 4.3 | Comparison with the state of the art                                                            | 75  |
| 4.4 | Comparison of wireless neural recording ICs                                                     | 82  |
| 5.1 | Estimates of $\sigma_{DNL,max}$ and $\sigma_{INL,max}$                                          | 100 |
| 5.2 | Single simulation time                                                                          | 103 |
| 5.3 | MonteCarlo simulation times for 100 runs                                                        | 103 |

# Chapter 1 Introduction

Approaching the mobile market saturation, consumer electronics and semiconductor industries are actively seeking for new unexplored fields to settle. Biomedical devices and wearable equipments are often addressed as potential protagonists of the next generation of devices that will become integral part of the daily life of millions of possible customers and people in general. By exploiting the performance of the silicon technology, consolidated in more than fifty years of development, it is possible to design ASICs and SoCs devoted to a wide range of applications: wireless sensor networks, implantable micro-systems but also efficient inertial and magnetic sensing systems are significant examples.

In this context, a microelectronic design which is oriented to efficiency represents the latest trend and the main challenge in the field of analog (but also digital) integrated circuit design.

### **1.1** Sensing the environment

Most of the applications described in the previous paragraph are typically characterized by the need of an analog front end to read the desired metrics from the surrounding environment (temperature, pressure, humidity, acceleration, speed...).Indeed, the world we live in is full of potentially useful information and this is in the analog form. The measurement, thus the translation of any desired kind of metric to an electrical one, is realized through sensors.

In the most cases, the output from sensors requires amplification and conditioning to provide the clearest signal to an analog-to-digital converter, which converts it to a more reliable form of compressed information. In fact, once the signal is coded into the digital form, it can be transferred more easily, potentially without losses and at the highest communication speed allowed by silicon technology.

This combination of conditioning circuitry is called *Analog (mixed-signal) Front End* (AFE) and can include amplifiers, filters, mixers and even the ADC. A sensing System on Chip (SoC) is typically composed by a front-end with multiple channels, each devoted to the readout of a sensor output.

Due to the complexity of the task, designing an entire low-power SoC with accuracy performance comparable to the state-of-the art or to the competitor products often requires a transversal knowledge of microelectronics. The optimization of such a system implies the understanding of architectural issues as well as the working principles and the key features of the several circuit blocks adopted in the project. Moreover, the awareness of technological implications and limits in the design may bring to better results. The designer's technological options vary between the most scaled nodes like 28nm, 40nm and 65nm to the 90-nm, 130-nm and 350-nm that are largely employed in the mixed-signal design.

It is not taken for granted that the most scaled technologies represent the best choice for the design of a SoC and it is not only a matter of costs. Recent surveys highlight how mature technology nodes such as 130-nm,180-nm and 350-nm are still interested by a large number of tape-outs [2].

Analog signal processing, which is a critical issue in the most part of sensing applications, does not benefit of the scaling as the digital does, and it often requires front-ends that dominate the power consumption contributions, relaxing significantly the need of an efficient digital circuitry. On the other hand, speed specifications may require the adoption of faster transistors. These are all aspect that must be taken in to account to plan and execute the design of state-of-the art efficiency circuits and systems.

### 1.2 Front-end design optimization

Optimizing a front-end is a necessary task in all portable applications but no universal guidelines are available. The target application imposes the system requirements both determining the performance metric of interest and their target values. It strongly influences the operating regime, the system architecture and the sizing criteria of the circuit components.

The physical and electrical characteristics of the sensors often establish the circuits that are required for the signal conditioning.

The same metric (Pressure, electric field, magnetic field, temperature...) can be detected indirectly by sensing a capacitance, a resistance, a charge or a voltage a delay or even a frequency. The most part of these classes of sensors can be modeled with an equivalent circuit composed by only passive elements (typically capacitors and resistors), but since the equivalent network may change and so its equivalent component values, it is not possible to define a unique design approach.

What is in general true is that once the signal is translated to an electrical metric (voltage or current), its conditioning first has to be performed in the analog form, and then converted to digital. The step at which the conversion occurs also can vary, but in the most part of the sensing systems, the analog signal conditioning performs an amplification after which the digitalization occurs. Since very often the available signal at the sensor output is small and interested by a background noise, the analog front-end should amplify it without introducing significant noise or distortion. This typically requires first stages represented by low-noise amplifiers that dominate the power contributions. Then, depending on the required resolution, the system output data rate and system bandwidth, also the power consumption of the ADC, and of other ancillary circuits becomes an issue.

### 1.2.1 Analog signal conditioning

Technology scaling advantages mostly the performance of digital circuits, but also improves some characteristics (matching, for example) of the analog circuits. More than a decade from the beginning of the large scale VLSI circuit design, analog circuits and amplifiers over all are still irreplaceable for the implementation of the interface circuitry between digital processing and the external world [3].

Sensing systems are the main example of circuits in which the analog front-end is irreplaceable and in these applications it is also what typically limits the efficiency. For this reason, most of the design efforts must be focused on the minimization of the power consumption of the first signal conditioning analog blocks.

Unfortunately, due to the variety of the available sensors and on the high variability of the design characteristic as a function of the sensing principle and the sensor electrical characteristic, it is not possible to define a universal optimization criteria.

Some front-ends require a continuous time operation, some other are operated in power-switching regime, some requires low input impedance, and some other require high input impedance. Defining a figure of merit for all class of amplifiers it is not possible.

In some cases, and these will be discussed in the chapters related to the realized ASICs, it is possible to evaluate the amplifying stage efficiency by means of a *Figure of Merit* (FoM). Depending on the circuit, the FoM takes into account a specific trade-off between two or more performance metrics which are inversely dependent to each other. Typically Power consumption, accuracy (SNDR, ENoB or input referred noise), and bandwidth. Additionally, especially for what concerns accuracy, there is not a unique metric that can be considered. In most of the cases the thermal noise limits the accuracy and the resolution of a system and can be considered the parameter of interest. By the way, also flicker noise have a big impact in CMOS continuous time circuits, obeys to different physical phenomena and cannot be included in the most of thermal-noise based figures of merit. In some other circuits mismatch and offset limits the performance [4], but also common mode and power supply rejection could be a main issue.

A good analog front end must deal with all these problems and its optimization cannot overlook the specific task for which it's conceived. In this work, all the designed purely analog circuits and amplifiers will be discussed with reference to the metrics and the figures of merit (when available) that are suitable for the application.

### 1.2.2 Analog to digital conversion

A complete sensing SoC is in general a multi-channel system. Multi-axial inertial sensors provide one signal conditioning channel per axis, as well as biomedical sensor that must guarantee the readout of multi-electrode array with one channel per-electrode. According to the generic number of channels  $N_{ch}$ , the available area, the required number of bits N and to the bandwidth of the system, that sets the sample rate, the architectural choice must be performed. Nevertheless, also the chosen ADC topology can influence the choice.

There are two great alternatives represented by a multiplexing/single-ADC scheme and a one-ADC-per-channel scheme, respectively. The first one is preferable when resolutions are high, when the available pitch for a single channel is small and when the number of channels as well as the required sampling frequency is limited. The second choice, on the contrary, is interesting when a large number of channels is required, with a lower resolution but with a higher sampling rate. In between there are intermediate architectural solutions [5] that approach the optimum but at the cost of a strongly reduced design reusability of the converter.

Since the aim of this work was not only to present optimized architectures but also to design state-of-the art efficiency building blocks, intermediate architectural solution have been avoided.

Analog-to-digital converters can be divided into two categories: 1) Nyquist-rate; and 2) over-



Figure 1.1: Efficiency chart of SAR and SDM implementationas a function of the resolution and taken from the last 10 years of VLSI and ISSCC presented works. The dotted line represent the efficiency limit imposed by noise contraints for a generical Nyquist rate converter [1].

sampling, based on the relationship between the sampling frequency ( $f_{sample}$ ) and the maximum input signal frequency. Nyquist-rate ADCs convert signals whose frequency is lower than the Nyquist frequency ( $f_{sample}/2$ ) but sometimes when the input signal bandwidth is low, it is possible either to clock Nyquist-rate ADCs at a rate well above the minimum Nyquist requirement or to adopt intrinsically oversampling converters like sigma-delta ADCs, with a desired oversampling ratio (OSR) that is defined as the ratio between the sampling frequency and the Nyquist rate. For a fixed  $f_{sample}$ , SNR improves as the OSR increases. However, since the circuit itself operates independently to the OSR, the ADC power consumption increases. With more than 100 implementations, successive-approximation-register (SAR) and delta-sigma ( $\Delta\Sigma$ ) analogto-digital converters (ADCs) alone account for more than 60% of the converters presented at ISSCC and VLSI Circuits Symposium in the last decade and represent the most efficient topologies for the Nyquist-rate and oversampling converters categories, respectively.

The reasons for this popularity can be found in the intrinsic power efficiency and amenability to scaling of SAR on one side, and in the robustness against circuit imperfections of  $\Delta\Sigma$  on the other. Fig.1.1 plots the resolution versus bandwidth plot of state-of-the-art SAR and  $\Delta\Sigma$ converters published in the last decade. Two distinct regions can be clearly identified. SAR ADCs dominate those applications requiring higher than 100 MHz bandwidth resolutions below 60 dB. Differently,  $\Delta\Sigma$  ADCs are the preferred choice below 1 MHz bandwidth and with a more than 12 bit effective resolution (e.g. audio, high-resolution biomedical or sensing applications). In between, lays a region equally populated by these two converter architectures, which corresponds to the requirements of several applications ranging from wireless communications and others. Given a designed circuit, a benchmark or a criteria must be defined to evaluate its efficiency. Most of the presented works can be compared referring to a so called *Figure of Merit* (FoM). This should rank the the answer that a desired design gives to a critical physical trade-off. As far as ADCs, the critical trade-off may vary as a function of requirements such as speed and resolution, thus there is not a unique figure of merit that can be considered. The *Walden* figure of merit [6] assumes a non-thermal trade-off between energy and resolution and it is therefore suitable mostly for low-moderate resolution and moderate-high speed designs [7], such as those where SAR ADCs are largely employed.

$$FoM_W = \frac{P_{ADC}}{2^{ENoB} \cdot 2 \cdot BW_{signal}} \tag{1.1}$$

As will be justified in all the related chapters, SAR ADCs have been preferred to sigma-delta converters for the system implementations presented in this this work. For this reason and when appropriate, their efficiency will be expressed by means of this FoM, in Joule-per-conversion-step.

### 1.3 Presented work

This thesis resumes the activity spent during the last four years on the analysis, optimization and design of circuit architectures devoted to different class of sensors. The goal of each project was to approach if not overcome efficiency performance of state-of-the-art systems presented in literature or, when available, of the market. Always, the architectural optimization have been pursued without losing sight on the efficiency of each circuit block, according to the belief that an optimized system is truly optimized not only by a clever choice of its architecture but if all its building blocks are pushed towards the best efficiency. In particular, the designs of three different sensing systems are presented in this work:

- 1. A 3-axis Lorentz force based magnetometer for Earth field detection and digital compass applications in 350-nm CMOS;
- A 3-axis AMR based magnetometer for Earth field detection and digital compass applications in 350-nm CMOS;
- 3. A 64-channels neural probing system with UWB wireless transmission in 130-nm CMOS;

The choice of the technology node is in line with the reasonable costs of moderate bandwidth sensing applications and to the recent trends of analog-mixed-signal tape-out per technology node [2]. All the the architectures and the circuit blocks that will be described in this work have been designed and measured (or ar currently under measurement) at Politecnico di Milano by the author, with the supervision of Prof. Bonfanti, and the guidance of Prof. A.L. Lacaita. The unique exception is represented by the UWB transceiver adopted in the first of the above mentioned systems, which was designed by the team of the ICARUS Lab of the University of Padua, Italy.

All the designed systems include AFEs, some of them even an ADCs and the last one is a complete SoC comprising a wireless transceivers.

They cover a significant variety of sensors, in particular they are related to:

- 1. A capacitive MEMS sensor;
- 2. A resistive (low-moderate impedance) AMR sensor;
- 3. A capacitive (high impedance) neural probe;

Apart from an efficiency oriented design and, just for two of them, a common target application, these three designs do not share much in terms of sensing principle and neither in terms of circuit specifications. For this reason they will be discussed separately, by three different chapters in the order presented above and by making dedicated final considerations. To avoid repetitions, when possible, cross-references will be made from different sections of different chapters.

In the following paragraph, a brief overview of the three designs is provided, highlighting the improvements that were presented with respect to the state-of-the-art.

### **1.3.1** Lorentz Force Magnetometer

The design design trade-offs for capacitive magnetometers coupled to integrated driving and readout circuits, which are mandatory in the perspective of the integration of MEMS digital compasses in portable devices, have not been investigated yet. Nevertheless, an integrated systems composed by sensor, driving and readout integrated electronics has still not been presented in literature for Lorentz Force MEMS Magnetometers. Aim of this work is to propose a complete sensing systems, provide design criteria and investigating trade-offs able to guide future optimization of this technology and demonstrate its suitability for portable electronics applications. The work presents a novel z-axis Lorentz force magnetometer coupled to a low-power readout integrated circuit, the first presented for Lorentz sensors readout. The sensor is fabricated using an industrial process and exploits a multi-loop architecture that gives a 5-fold sensitivity and resolution improvement, useful for off-resonance operation The integrated readout, which has been the specific task of the author in this project, is based on a differential capacitivelycoupled amplifier front-end, designed to be balanced with the sensor in terms of noise and power consumption. The overall system, including also demodulation and filtering stages, provides a programmable full-scale up to  $\pm 2.4$  mT, limited by the electronics saturation. The bandwidth can be programmed up to 150 Hz for off-resonance operation, where a sub-400 nT/ $\sqrt{Hz}$  resolution at 775- $\mu$ W power consumption is obtained, including the integrated readout and the Lorentz current. Perspectives on low-consumption driving oscillators for off-resonance-mode are finally given. The power consumption of this system, comprehensive of the current necessary to drive the sensor and the current required bias the readout circuit, is the lowest among the Lorentz sensing systems presented in literature. Moreover, it is the first presenting an ASIC for the readout and driving. These represent an improvement on the typically adopted discrete components electronic design, which is intrinsically less efficient, compact, and introduces a bigger amount of parasitics that contribute to the degradation of the circuit performance.

### 1.3.2 AMR Magnetometer

The design of a 3-D digital magnetometer for consumer electronic compass, compliant with indoor navigation future navigation efficiency requirements is here presented. The system is composed by three Anisotropic Magneto-resistive Sensors (AMR) and a driving and sensing ASIC realized in CMOS  $0.35\mu$ m AMS technology. The readout front-end is composed by a fully-differential instrumentation amplifier featuring an auto-zeroing technique to cancel the electronic offset and reduce the flicker noise contribution. Aim of this work was to explore the design trade-offs for an optimized magnetic sensing system based on AMR technology, thus on a low-impedance, resistive sensor. In fact, despite not being the first integrated system of this kind since several commercial products are available on the market, a detailed analysis of a design and sizing criteria still lacks and the only similar integrated system presented in literature is presented in [8] but configured a different target application (aerospace) and without a detailed description of design trade-offs. In this work we discuss the design of each circuit block of the front-end, presenting equations which can be adopted as reference to deal with noise, bandwidth and power consumption specification. While measurements are still work in progress, simulation results to validate the design are presented and show that the system guarantees a bandwidth of 50Hz, a 4mGa magnetic resolution, and a 20Ga FSR, in line with commercial products

### 1.3.3 Wireless neural probing system

This work aims at showing the feasibility of a very low power integrated circuit for multichannel recording and wireless transmission of raw data (signal frequency up to 10kHz) at high bit-rate (20Mbps), and it presents a functioning prototype with the lowest power-per-channel among the Neural probing systems that are able to register the neural activity from the Local Field Potentials (LFPs) to the Action Potential (APs). These features make the proposed circuit a viable option in the perspective of both head-mounted and implantable BMI applications, once provided with an energy harvesting solution, like inductive powering, and a miniaturized UWB antenna, which are, however, outside the scope of this work. The system-on-chip (SoC) features 64 channels with a 20-Mbps wireless telemetry. Each channel of the analog front-end consists of a low-noise band- pass amplifier, with an input-referred noise of  $5.6\mu$ Vrms in a 0.001-10 kHz band, and a 31.25-kSps 6fJ/conversion-step 10-bit SAR A/D converter. The recorded signals are multiplexed in the digital domain and transmitted via a 11.7%-efficiency pulse-positionmodulation UWB transmitter (TX), reaching a transmission range in excess of 7.5m. The chip has been fabricated in 130-nm CMOS process, measures  $25mm^2$  and dissipates  $965\mu$ W from a 0.5-V supply. This SoC features the lowest power-per-channel ( $15\mu W$ -per-channel) among stateof-the-art wireless neural recording systems with a number of channels larger than 32 and able to detect both LFPs and APs. The proposed circuit is able to transmit the raw neural signal in a large bandwidth (up to 10-kHz), without performing any data compression or losing vital information.

Within this chapter, a particular focus has been given on the ADC design. Designing a converter to be used with a replica per-channel imposed the optimization of both power consumption and area and thus to clarify the best design choices to pursue those targets. The conventional binary weighted array successive approximation register (SAR) analog-to-digital converter (ADC) is the common topology adopted to achieve high efficiency conversion ( i.e. with less than 10 fJ/ conversion-step) even if it requires extra effort to design and simulate full custom fF or sub-fF capacitors. Which are often an uncomfortable option for a designer who is not specialized on converters. This work presents the design and the optimization criteria of a fully-differential SAR ADC achieving an efficiency similar to conventional binary weighted array converters but adopting standard MiM capacitors, thus greatly simplifying the design. Its efficiency of 6-fJ/conversion-step is comparable to the best results published so far and it is the lowest among ADCs in 130-nm or less scaled technology. The ADC core occupies an active area of only  $0.045 mm^2$ .

### Chapter 2

### Lorentz Force Magnetometer

There is a growing interest in three-axis magnetic field sensors for low-cost, low-power applications such as electronic compass for smart mobile phones and portable devices. Microelectromechanical systems (MEMS) based Lorentz force magnetometers are attracting increasing interest due to the possibility of implementing 3-axis planar devices, with several tens of Gauss linear full-scale, absence of magnetic materials, and compatibility with inertial devices for multiparameter, single-process units. The performances of a magnetic field sensing system based on this kind of device, in terms of resolution, linear full-scale range and bandwidth, depend on the micromachined device, on its packaging, and on the readout electronics. Currently only a few works have addressed integrated electronics for Lorentz-force MEMS magnetometers [9].

Concerning the MEMS sensing element and its operation, several works exploited AC currents injected at the device resonance frequency to amplify the Lorentz-force induced motion with a high-quality factor [10–14]. In other words the magnetic field induces, through the Lorentz force, an amplitude modulation (AM) of the resonant displacement of the MEMS suspended frame. Critical drawbacks of this kind of operation are a trade-off between the achievable bandwidth and resolution [15], the challenges of tracking the device resonance without large offsets, and the sensitivity long-term stability, affected by the Q-factor dependence on temperature. Current chopping was proposed as a technique to reduce offset and associated drifts [14].

Frequency modulated (FM) Lorentz-force magnetometers, first proposed in [16], were recently given attention. This solution has some advantages over AM magnetometers operated at resonance, namely no trade-off between bandwidth and quality factor and improved sensitivity stability with temperature [17]. However, even improving the sensitivity with micro-leverage solutions [18], white noise performances are limited to 20  $\mu T/\sqrt{Hz}$  using 6 V of bias of the suspended mass, and 4 mA of driving current. Assuming a circuit supply of 3 V, this would turn into more than 12 mW of overall power dissipation, which is largely out of the specifications of consumer applications. Furthermore, long-term offset stability is affected by the frequency changes with temperature.

Another way to overcome both the trade-off between resolution and bandwidth, and the long-term stability issues associated to resonant operation, is to drive the magnetometer slightly off-resonance, as often adopted in MEMS gyroscopes [19–21]. In this operation mode, the scale-factor becomes less dependent of the quality factor. Moreover, the scale-factor dependence on resonance frequency variation induced by temperature changes is almost irrelevant. The drawback of this solution is a considerable sensitivity loss: the residual gain with respect to DC operation can be in the order of 50, instead of typical Q values between some hundred and a



Figure 2.1: SEM photograph of the device showing the diamond-shaped tuning fork, the current recirculation concept, implemented through 10 metal coils deposited over the springs, and the parallel plate cells, used for the capacitive readout and tuning.

few thousand.

This work presents a z-axis magnetometer operated off its 20kHz resonant frequency, with a with a metal multi-loop structure that helps amplifying the Lorentz current effect to recover a high sensitivity in off-resonance operation. The device exploits a direct metal-on-polysilicon deposition step, and it is fabricated with a standard industrial technology process.

The device is coupled to a 0.35- $\mu$ m-technology integrated circuit for the readout, which includes a capacitive-sensing front-end with programmable-gain amplifiers, a mixer and a lowpass filter for signal demodulation. The circuit provides the biasing for the capacitive stators (2 V) and requires a current of 150  $\mu$ A from a 3 V supply. The overall power dissipation, including both the Lorentz current (107  $\mu$ A<sub>rms</sub>) and the circuit biasing, is of 775  $\mu$ W only.

In these conditions, a sub 400 nT/ $\sqrt{\text{Hz}}$  resolution is obtained, with package pressures in the order of 0.7 mbar, and for a frequency mismatch up to 200 Hz, where the noise contributions of the device and the electronics are designed to match. This allows to obtain a bandwidth of more than 50 Hz, compatible with consumer applications requirements. Large full-scale ranges can be obtained by tuning the driving current. The linearity error, limited by circuit saturation, is lower than 2% for magnetic fields up to  $\pm 2.4$  mT, exceeding the full-scale of current devices based on Hall effect or anisotropic magneto-resistance (AMR).

The chapter is organized as follows. Section 2.1 describes the characteristic of the designed magnetometer in both the mechanical and the electrical domains, and derives the intrinsic device-limited noise. Section 2.2 describes the architecture of the integrated circuit realized for the readout of the MEMS sensor, discussing the noise limits. In Section 2.3, the measured performance are reported for some key circuit blocks and for the overall sensing system. Finally, conclusions and perspectives on the driving circuit are drawn in Section 2.4.

### 2.1 Magnetometer design

This section describes the Lorentz-force-based MEMS magnetometer designed for z-axis magnetic field sensing. With respect to previous implementations in the same process [13], the device exploits a tuning fork geometry, to reject accelerations and vibrations, following what

implemented in gyroscopes [22], and the current recirculation loop concept [12], to amplify the sensitivity. The sensor is fabricated using the Thick Epitaxial Layer for Microactuators and Accelerometers (ThELMA) process from STMicroelectronics [13]. All the shown finite element simulation (FEM) results were obtained using full 3-D geometries in *Comsol Multiphysics*, with at least five elements within the smallest dimension of every structural part.

#### 2.1.1 Mechanical simulations and design

The sensor mechanical part is made of two springs, each formed by ten beams of length L, holding a suspended frame (rotor). The beams of each spring are connected through thin links. The frame faces two nested pairs of fixed electrodes, defining a set of differential parallel plate capacitors. Fig. 2.1 is a scanning electron microscope (SEM) photograph of the device. As described in more details in the following, the current *i* flows through the two (top and bottom) springs in opposite directions, so that a z-component of the magnetic field gives rise to in-plane, opposite Lorentz forces F: the frame is therefore split into two symmetric sub-frames, coupled through a tuning-fork spring. The presence of the tuning-fork determines the existence of both an in-phase and an anti-phase in-plane translational modes. The opposite direction of the Lorentz current excites the anti-phase mode, whose shape is indicated in the eigen-frequency FEM simulation of Fig. 2.2a. The detection of the magnetic field B is obtained through suitably arranged differential parallel plates formed by the fixed electrodes facing the frame.

On the contrary, an external acceleration causes concordant forces for the two sub-frames (see the in-phase mode FEM simulation, Fig. 2.2b). In this way, the action generated by the magnetic field results in a differential signal, while external accelerations are ideally rejected and, to first order, does not provide any differential capacitance variations. This approach is to be pursued because the Lorentz force (like the Coriolis force in gyroscopes) can be orders of magnitude smaller than inertial forces.

In order to further decrease effects of accelerations and vibrations, the in-phase mode has a resonance frequency higher than the anti-phase mode. This is obtained thanks to the specific geometry of the tuning fork, formed by a diamond spring and two holding bars [23]. As shown in the two insets of Fig. 2.2, while the anti-phase motion excites the first mode of the holding bars, the in-phase motion of the sub-frames excites the second mode of the holding bars, and is therefore shifted upwards. The springs and the tuning fork geometry are designed to set the anti-phase drive mode  $f_0$  around 20 kHz, while the in-phase mode falls at about 43 kHz.

### 2.1.2 Electrical simulations and design

As described in the Introduction, off-resonance operation implies a decrease of the structure motion compared to resonant operation. A way to recover this signal decrease can be obtained if the current is re-used multiple times through recirculation loops. E.g. for N = 10 current loops, a sensitivity increase by a factor 10 is directly obtained.

The definition of the recirculation path required a preliminary evaluation of the technological options available for the deposition of metal layers on the structural polysilicon used for the micro-fabrication. Two options were investigated: (i) the deposition of metal paths isolated from the structural layer through a barrier material and (ii) the deposition of metal paths directly on the polysilicon layer. The former option has the obvious advantage of allowing an optimum definition of the current paths, completely decoupling the electrical domain from the mechanical domain. However, it proved to be technologically challenging due to residual



Figure 2.2: Results of FEM simulations for (a) the anti-phase mode, excited by the Lorentz current flowing in opposite directions through the springs, and for (b) the in-phase mode, excited by external accelerations. The insets show how the tuning-fork geometry, coupled to the holding bars, helps in shifting the in-phase mode to frequencies higher than for the anti-phase mode.

stresses on the different materials forming the stack. The second option was therefore selected. The challenge in the design of multiple loops within a standard industrial process, like the one adopted in this work, is thus represented by the absence of an insulating layer between the metal (deposited to form the loops) and the heavily doped polysilicon structural layer.

This challenge can be understood by looking again at Fig. 2.1: the current enters into the loops on the top-left corner, as shown (IN). The ideal path, represented by the metal deposition, is indicated by the solid arrows (shown only for the first two loops) and is based on recirculation, with an exit on the central-left corner after 10-loops (OUT). However, as there is no isolation between the metal and the structural polysilicon layer, leaky paths can bypass the desired current flow. The device design therefore requires also electrical simulations to predict and optimize the current effectiveness. These simulations account for the nominal resistivity of the 700-nm-thick Al metal layer ( $0.04 \ \Omega_{/\Box}$ ), of the 22- $\mu$ m-thick polysilicon layer ( $20.5 \ \Omega_{/\Box}$ ), and for the nominal metal-over-poly contact resistance per unit area.

Using the same geometry and software adopted for the mechanical-domain simulations, electrical-domain simulations were performed by applying a fixed voltage difference between the multiple-loop ends. To minimize current leakages, the metal path was designed to have the lowest possible square resistance, increasing the path width, yet compatibly with the beams width and the dimensioning of the anti-phase mode resonance frequency to be set at 20 kHz. In the optimum situation, the nominal beams width is 6  $\mu$ m, with a nominal metal width on top of them of 5.7  $\mu$ m and an expected rectangular cross-section. A 0.15  $\mu$ m per side enclosure value of Al within polysilicon was adopted after discussing with process engineers. For the given data, Fig. 2.3 reports the current density through the metal loops as a function of the space



Figure 2.3: Simulation for the Lorentz current flowing through the metal loops from the input (x-axis left end) to the output (x-axis right end). The curves refer to different metal widths and cross-sections or to different link geometries. The partial current loss in the first loop decreases if the resistance of the Al path is decreased (wide metal), or when the links resistance increases (serpentines).

coordinate along the current path. The reader can note, for the nominal 5.7  $\mu$ m width, a 19% current decrease within the first loop, occurring through the leaky path represented by the thin, straight, polysilicon links between the spring beams. No significant further decreases along all the other loops are observed, as they partly re-gain the current lost by the first loop. This 19% loss thus reduces the number of effective loops to an average value  $N_l \approx 8.1 < N$  (dashed line).

Fig. 2.4a shows a SEM photograph of the effectively obtained metal-over-poly structure, after the device fabrication. An etching of the metal paths larger than expected leads to a metal width in the order of  $3.1 \,\mu$ m. According to the simulations above, repeated for the actual metal geometry (triangular cross-section with a 500-nm height, see Fig. 2.4b) the undesired current paths reduce the effective number of loops to a value  $N_l \approx 5.1$ . This is the value that will be used for sensitivity and resolution predictions in the following of this work. For future implementations, the metal mask will be widened to bypass the excessive Al etching and to minimize the loop resistance. Further, alternative geometries of the thin polySi links, like serpentine structures, are under investigation to increase the resistance of the leaky path, so to maximize to  $N_l \approx 9.3$  the loop efficiency (see again Fig. 2.3 for the predictions on this new geometry with the serpentine structure shown in the inset).

### 2.1.3 Device packaging and overall dimensions

The device has an overall area (excluding the interconnections to the pads) of 1700  $\mu$ m x 750  $\mu$ m. Other parameters are summarized in Table 2.1. The magnetometer is provided with tuning electrodes, used for offset minimization as described in [24].

The sensor is packaged at relatively low pressure (0.75 mbar, the same used for other inertial)



Figure 2.4: Details of a SEM photograph showing the metal-over-poly structure at one spring end (a), and the corresponding geometry for the post-fabrication simulations (b).

| Parameter                                         | Value                            |
|---------------------------------------------------|----------------------------------|
| Spring length $(L)$                               | $1400~\mu{\rm m}$                |
| Nominal thickness $(h)$                           | $22~\mu{ m m}$                   |
| Nominal air gap $(x_0)$                           | $2.1~\mu{ m m}$                  |
| Nominal rest capacitance $(C_0)$                  | $420~{ m fF}$                    |
| Nominal package pressure $(p)$                    | $0.75 \mathrm{\ mbar}$           |
| Beam stiffness (from FEM) $(k)$                   | $80 \mathrm{N/m}$                |
| Calculated effective mass $(m)$                   | $5  { m nkg}$                    |
| Calculated damping coefficient $(b)$              | $8{\cdot}10^{-7} \ { m N/(m/s)}$ |
| Anti-phase resonance frequency (from FEM) $(f_0)$ | $20.35~\mathrm{kHz}$             |
| Calculated anti-phase quality factor $(Q)$        | 460                              |
| In-phase resonance frequency (from FEM) $(f_i)$   | 43.26 kHz                        |

| Table 2.1 | : Device | parameters |
|-----------|----------|------------|
|-----------|----------|------------|

sensors in this process) to minimize the damping coefficient. In off-resonance operation, this in turn enables thermomechanical noise floor reduction down to the minimum level allowed by the technology without reducing the bandwidth [15], as later described in Section II.E. Considering the nominal 2.1  $\mu$ m gap between parallel plates, the Knudsen number can be calculated to be well above 10; the corresponding gas regime is thus the free-molecule flow one [25]. In this regime, the dissipation can be assessed by taking into account the interaction between the gas molecules and the device sidewalls, without accounting for the molecule-to-molecule interaction, using boundary models like those discussed in [26]. The so calculated quality factor for the antiphase mode turns out to be in the order of 460.

#### 2.1.4 Prediction and validation of the sensitivity

The mechanical sensitivity, i.e. the differential capacitance variation per unit magnetic field change,  $S_m = dC_d/dB$ , can be estimated as:

$$S_m = \frac{dx}{dB}\frac{dC_d}{dx} = \frac{dF}{dB}\frac{dx}{dF}\frac{dC_d}{dx} = N_l L_{eff}i_0 \cdot \frac{Q_{eff}}{k} \cdot \frac{2C_0}{x_0},$$
(2.1)

where  $L_{eff}$  is the effective length of the springs, that takes into account that the Lorentz force is distributed along the springs and cannot be represented by a point-like force applied to the mass (with our geometry,  $L_{eff} \simeq L/2$ ),  $i_0$  is the AC amplitude of the driving current,  $Q_{eff}$  is the effective Q-factor, that represents the displacement amplification with respect to DC operation, k is the spring constant of the device,  $C_0 = N_r \varepsilon_0 A/x_0$  is the DC capacitance between the rotor and each stator,  $N_r$  being the number of differential parallel plate readout cells,  $\varepsilon_0$  being the permittivity inside the package (assumed as that of vacuum), and A their facing area. Finally  $x_0$  is the gap between the rotor and the stators.

For off-resonance mismatches larger than the MEMS intrinsic bandwidth  $\Delta f = f_0 - f_d \gg f_0/(2Q)$ , the effective Q-factor can be expressed as  $Q_{eff} = f_0/(2\Delta f)$  [15]. The mechanical sensitivity can be thus evaluated as:

$$S_m = N_l L_{eff} i_0 \frac{f_0}{k\Delta f} \frac{N_r \varepsilon_0 A}{x_0^2}.$$
(2.2)

The theoretically-calculated value, normalized to the AC current amplitude, is  $S_{m,i} = S_m/i_0 = 0.87 \text{ zF}/(\text{nT}\cdot\text{mA})$  for an off-resonance offset  $\Delta f = 200$  Hz. Eq. 2.2 is derived by assuming that the sensor is driven by a harmonic current with amplitude  $i_0$  at a frequency  $f_d = f_0 - \Delta f$ . The differential capacitance variation will be also an AC signal, at the same frequency, with an amplitude equal to  $B \cdot S_m$ .

For a validation of the predicted sensitivity, before coupling the device to the integrated circuit discussed in the following, a characterization of the inherent device performance was done using a low-noise setup based on discrete electronics and an external lock-in amplifier (SRS830 from Stanford Research Systems). The discrete electronics relies on a pair of charge amplifiers followed by an instrumentation amplifier: the reader can refer to [21] for details about this discrete circuit implementation.

The sensitivity was measured inside a Helmholtz coil magnetic field generator from Micromagnetic Inc., after compensating for the Earth field. Fig. 2.5 shows the experimental results (blue line, with markers) when sweeping the field between -2.4 mT and 2.4 mT, together with their best linear fitting (green dotted line). The measurements were carried out with an AC current amplitude of 58  $\mu$ A. The measured normalized sensitivity is 0.75 zF/(nT·mA), showing a 15% deviation from predictions. The difference may be ascribed to deviations between nominal and obtained parameters, e.g. the gap etching between parallel plates, or the metal-over-poly contact resistance. The inset of Fig. 2.5 also shows the linearity error, defined as the deviation of the measurements from the best fitting line normalized to the full scale range (FSR). As shown, over the whole FSR of  $\pm 2.4$  mT, the linearity error is lower than 0.6% (actually limited by noise rather than real sensor non-linearity).



Figure 2.5: Mechanical sensitivity of the magnetometer. The blue squares correspond to measurements, the green dotted line is the best linear fitting. The inset shows the linearity error over the full scale range ( $\pm 2.4$  mT).

### 2.1.5 Prediction of thermo-mechanical noise

Assuming, initially, that resolution is limited by the thermomechanical noise of the device, the minimum detectable magnetic field per unit bandwidth is given by [15]:

$$\sigma_B = \frac{2}{N_l L_{eff} i_0} \sqrt{k_b T b},\tag{2.3}$$

where  $k_b$  is the Boltzmann constant, T is the absolute temperature and b is the damping coefficient. As mentioned in Section II.C, b was estimated with the model proposed in [25,26]. At pressures in the order of 0.75 mbar, the system operates in free-molecule flow regime. Assuming that squeeze-film damping between parallel plates dominates, a damping coefficient per unit facing area  $b_a = b/A = 4.9 \text{ (N/(m/s))/m}^2$  was used. The resolution, normalized per AC current amplitude, i.e.  $\sigma_{B,i} = \sigma_B \cdot i_0$ , can be estimated as  $\sigma_{B,i} = 33 \text{ nT} \cdot \text{mA}/\sqrt{\text{Hz}}$ .

The output noise, expressed as the rms of random fluctuations of the differential capacitance per unit bandwidth, can be estimated as the product of the equivalent magnetic field resolution and the sensor sensitivity:

$$\sigma_C = \sigma_B \cdot S_m = \sqrt{4k_b T b} \frac{Q_{eff}}{k} \frac{dC_d}{dx}.$$
(2.4)

With the described device and process parameters, this can be evaluated as  $\sigma_C = 30 \text{ zF}/\sqrt{\text{Hz}}$ , independently of the driving current. This value was taken into account as the target input-referred capacitance noise for the ASIC design described in the following section.

### 2.2 Front-end electronics



Figure 2.6: Block diagram of the sensing readout circuit (a) and a simplified single-ended scheme of the amplifier (b).

AM off-resonance operation is characterized by identical signal-to-noise ratio (SNR) but lower sensor gain than AM resonant sensing [21]. From an electronics design perspective, this implies both a lower input signal and a lower thermomechanical noise (e.g. in terms of rms capacitance variation), as defined by Eq. 2.2 and Eq. 2.4. In order not to worsen the SNR, the analog integrated front-end must therefore reach noise levels lower than (or at least comparable) to the device noise, even in the described demanding situation. This represents a big challenge in terms of power-noise trade-off.

In the specific case of the designed circuits, not to degrade the SNR of the given sensor, the target input referred noise of the front-end had to be lower than  $30zF/\sqrt{Hz}$ . This value was derived in the previous section (Eq 2.4).

A generic circuit scheme to describe the architecture of the capacitive readout circuit is depicted in Fig. 2.6a and associated to a basic single-ended implementation. The sensor, biased by a proper network, is modeled as a capacitance whose value varies at a driving frequency imposed by the driver circuit, and with an amplitude set by the magnetic field value. The first mixer, depicted in the sensor block is fictional but gives reason of the modulated regime determined by the AC current injection in the sensor operated by the driving circuit.

The amplifying stage features cut-off frequencies that lay far enough from the operating frequency not to impair the gain, then a direct down-conversion mixer demodulates the signal into the baseband. A final low-pass filter sets the system bandwidth. The amplifier, that will be described more in detail later in the chapter, is a cascade of a low-noise transcapacitance stage and a charge amplifier. As shown in the single-ended simplified scheme of Fig. 2.6b, the sensor biasing is guaranteed by the same amplifier thanks to the feedback configuration which is active in DC and sets the stators voltage at a desired  $V_{bias}$ . For high values of the feedback resistors  $R_f$ , the signal current is integrated on the first feedback capacitance to produce the

output voltage of the first amplifier. The second stage acts as a pass-band charge amplifier whose gain is set by the capacitive ratio between  $C_2$  and  $C_f$ . It must be noticed the presence of a capacitance connected to the virtual ground of the low noise amplifier. This is modeled as the sum of the MEMS resting capacitance  $C_{mems}$  and the parasitic  $C_{par}$  introduced at the interface and must be taken in consideration to guarantee the stability and the noise requirements of the readout circuits. Its impact on the front-end performance will be discussed later.



Figure 2.7: Schematic representation of the driving and the readout circuits.

The implemented capacitive readout circuit is described in the following and it is based on the version proposed in [9,27]. With respect to the referenced works, the circuit features custom-designed pads to minimize their parasitic capacitance to ground and associated noise worsening. Further, gain and filtering stages were tailored to the nominal performance of the device of this work. Finally, the integrated circuit shown here also features a Pierce oscillator to be coupled to a MEMS resonator, for the implementation of the reference driving frequency. All the circuits are operated under a 3-V supply voltage.

A simplified schematic of the system is shown in Fig. 2.7: the readout chain is composed by a cascade of two band-pass amplifiers, a down-conversion mixer, an instrumentation amplifier that transforms the signal from fully-differential to single ended and a  $g_m$ -C filter setting the final bandwidth of the system.

### 2.2.1 Amplification stages

The first stage is a low-noise amplifier (LNA) implemented as a fully-differential charge amplifier with input nodes directly connected to the magnetometer stators. The rotor is assumed to be at ground potential, while the stators are biased at the common-mode input voltage  $V_{bias}$ of the LNA that, in turn, is equal to its output common-mode voltage. This voltage value can be externally tuned in order to properly bias the magnetometer and to maximize the signal gain of the readout electronics. Indeed  $V_{bias}$  directly determines the LNA differential output voltage:

$$v_{o1}\left(t\right) = \frac{V_{bias}}{C_F} C_d\left(t\right),\tag{2.5}$$

where  $C_d$  is the differential capacitance variation of the MEMS structure, and  $C_F = 25$  fF is the feedback capacitance of the amplifier. The  $C_F$  value together with  $V_{bias}$  sets the gain. The parasitic capacitance between the input and the output of the amplifier is estimated to be less than 2 fF.



Figure 2.8: Single-ended electric simplified scheme of the amplifying change highlighting the internal two-stage architecture of the OTAs of the two amplifiers.

This topology was preferred to a transresistance stage not only for its lower noise, but also for the dependence of the gain on a capacitance value, that can be integrated more easily than the high resistance value required by an equivalent transresistance amplifier.

The feedback resistors are implemented by two PMOS transistors biased in sub-threshold region [28,29] providing a resistance  $R_{PMOS}$  larger than 10 G $\Omega$  and a closed loop cut-off frequency approximately equal to:

$$f_{cut} = \frac{1}{2\pi R_{PMOS}C_F} \approx 600 \,\mathrm{Hz}.$$
(2.6)

This frequency corresponds to the lower pole of both the transfer function between the input current and the output voltage of the LNA (a low-pass pole) and of the voltage transfer function of the second amplifier (a high-pass pole). Setting the pole value below 1 kHz, the differential capacitance variations around 20 kHz ( $C_d(t) = B \cdot S_m \sin(2\pi f_d t)$ ) are effectively amplified as in Eq. 2.5. Furthermore,  $f_{cut}$  is also the low-pass cut-off frequency of the feedback pseudo-resistor noise, whose contribution in the proximity of the sensor operating frequency is then negligible.

#### 2.2.2 Noise analysis

The electronic noise at the output of the overall circuit is mainly due to the input differential pair of OTA1. The noise power spectral density at the LNA output is [30]:

$$\overline{E_{n,out}^2} \simeq \overline{E_{n,eq}^2} \left( 1 + \frac{C_T}{C_F} \right)^2, \tag{2.7}$$

where  $C_T$  is the total capacitance seen from any OTA1 input node to ground and it is given by the sum of the MEMS rotor-to-stator DC capacitance  $C_0$  and the parasitic capacitance  $C_P$ .  $\overline{E_{n,eq}^2}$  is the input-referred OTA1 noise power spectral density, dominated by the white noise contribution and equal to [31]:

$$\overline{E_{n,eq}^2} = \frac{8k_B T\gamma}{g_{m_{1,2}}} \left(1+\alpha\right),\tag{2.8}$$

where  $g_{m_{1,2}}$  is the transconductance of the transistors of the input pair. The coefficient  $\alpha$  is the sum of the ratios between the transconductances of the other transistors of OTA1 (mirror, second-stage...) and that of the input pair. To maximize their efficiency  $g_m/I$ , the input devices M1 and M2 were biased in weak-inversion, very near to the sub-threshold region. On the contrary, the load transistors M8 and M9 were biased in strong inversion. In such a bias condition,  $\alpha \ll 1$  and Eq. 2.8 can be written as follows:

$$\overline{E_{n,eq}^2} \simeq \frac{4k_B T n^2 U_T}{I_{bias}},\tag{2.9}$$

where n is the sub-threshold slope coefficient,  $U_T$  is the thermal voltage and  $I_{bias}$  is the current flowing in each of the input devices.

Based on Eq. 2.5, the noise contribution of the sensing electronics to the overall resolution can be evaluated in terms of capacitance noise density:

$$\sigma_C \simeq \frac{\sqrt{E_{n,eq}^2} \left(1 + \frac{C_T}{C_F}\right) C_F}{V_{bias}},\tag{2.10}$$

that can be compared to the corresponding sensor contribution expressed in Eq. 2.4. By substituting the expression in Eq. 2.9 into Eq. 2.10, also the capacitive noise contribution can be expressed as a function of the input transistors bias current:

$$\sigma_C \simeq \frac{\sqrt{4k_B T n^2 U_T} \left(1 + \frac{C_T}{C_F}\right) C_F}{V_{bias} \sqrt{I_{bias}}},$$
(2.11)

clearly showing that noise reduction requires to raise the bias current  $I_{bias}$  quadratically. For this reason it is not convenient, in terms of power efficiency, to bias the input transistors with a current larger than the value strictly needed to keep the amplifier noise contribution comparable or slightly lower than the sensor contribution.

The minimum detectable magnetic field guaranteed by the sensing system can be then obtained by dividing the capacitive noise expressed in Eq. 2.10 by the sensor sensitivity  $S_m$ :

$$B_{min} \simeq \frac{\sqrt{E_{n,eq}^2 BW} \left(1 + \frac{C_T}{C_F}\right) C_F}{S_m V_{bias}}$$
$$\simeq \frac{\sqrt{E_{n,eq}^2 BW} (C_P + C_0)}{S_m V_{bias}}.$$
(2.12)

The simplification above resonably assumes  $C_P + C_0 \gg C_F$ . It turns out that the parasitic capacitance  $C_P$  must be minimized or at least kept comparable to the MEMS capacitance in order not to amplify the electronic noise.

In the present design, the parasitic capacitance has been minimized by directly bonding the two dies and using a refined design of the PADs. Such custom PADs have an overall area of



Figure 2.9: Transistor-level implementation of OTA1, with its common-mode feedback network.

 $(60\mu)^2$ , 62% lower than for library PADs; they also feature custom PAD protections (reversely biased diodes) smaller than the library ones; further, they rely only on top-level metals to minimize the parasitic capacitance towards the substrate. These efforts made possible to keep  $C_T$  lower than 2.5 pF, compared to a MEMS capacitance of 420 fF, while a standard PAD would have given a  $C_P$  capacitance larger than 6 pF, like e.g. in [27].

On the other hand, by increasing the bias voltage of the stators, the sensitivity improves without any impact on the electronic noise, thus reducing the equivalent magnetic noise floor due to the amplifier noise. With a first stage bias current equal to 50  $\mu$ A, the electronic capacitive noise contribution was kept below 30 zF/ $\sqrt{\text{Hz}}$ , which was the sensor noise derived in Eq. 2.4.

#### First amplifier sizing and design criteria

The internal architecture of the first Operational Transconductance Amplifier (OTA1) relies on a two-stage, Miller-compensated, fully differential topology. A simplified single-ended electrical scheme is depicted in Fig. 2.8 while its transistor level implementation is illustrated in Fig. 2.9 together with its common-mode feedback network. A pMOS input pair was chosen and sized large to reliably bias the stator while keeping flicker noise contribution well below white noise floor.

The first stage adopts a self-consistent common-mode feedback network similar to those presented in [5,32], but adapted to a higher power supply by the introduction of a pair of source followers (M3 and M4). These devices shift the gate bias voltage of the tail transistors M6 and M7 by a  $V_{GS_{3,4}}$  with respect to the output nodes of the first stage, thus avoiding any transistor to operate in ohmic region. Since M3 and M4 draw less than one tenth of the current of the input transistors, their impact on the power consumption is negligible. With this topology the common mode voltage of the output nodes of the first stage is set by the values of  $V_{GS_{3,4}}$  and  $V_{GS_{6,7}}$ .

By the electrical point of view, noise and bandwidth requirements, as well as the expected capacitive loads set the required currents in each stage. The first stage current is usually set by noise constraints. As discussed in section 2.2.2, the electronic noise contribution of the amplifyng chain is set by the input pair of OTA1. By inverting Eq. 2.11, the current that is necessary to bias the first stage can be derived:

$$I_{bias,1} \simeq \frac{8k_B T n^2 U_T \left(1 + \frac{C_T}{C_F}\right)^2 C_F^2}{(V_{bias} \sigma_C)^2},$$
(2.13)

To meet the requirement of equivalent capacitive input noise of  $30zF/\sqrt{Hz}$ , with a PAD parasitic of 2.5pF a MOS input pair gate capacitance of roughly 1pF, a stators bias voltage of 2V, an effective coefficient (sub-threshold slope) n equal to 1.65 (according to simulations) and considering the side contributions of the load transistors, the necessary bias current of the first stage is of about  $10\mu$ A. To further reduce this contribution, this value has been oversized to  $50\mu$ A. This decision was due to the fact that the integrated circuit and the sensor have been developed in parallel, and the originally defined sensor performance was expected to guarantee a better lower noise floor, and thus to require lower electronic noise. On the other hand, for what concerns the closed-loop bandwidth requirement, it must be set so that the in-band gain of the amplifier was not impaired. The low-pass pole of the closed-loop configuration is given by the unity-gain frequency of the loop-gain. The whole amplifier can be considered as a miller topology whose dominant pole  $f_{L,OTA1}$  (See Fig. 2.8) is set around 1kHz. Considering the attenuation of the feedback network composed by  $C_F$  and  $C_T$ , a closed-loop pole of 120kHz (just above the 20kHz sensor operating frequency), the GBWP of OTA1 must be of 16.8MHz. This implies the adoption of a Miller capacitance  $C_{C,OTA1}$  that is given by:

$$C_{C,OTA1} \simeq \frac{I_{bias,1}}{4\pi n U_T GBW P_{OTA1}} = 5.5 pF \tag{2.14}$$

For stability issues, the second pole should be placed at twice the GBWP. Considering a 2.5pF output load represented by the input capacitance  $C_2$  of the second amplifier, the necessary second stage current is given by:

$$I_{bias,2} \simeq 8\pi C_2 n U_T GBW P_{OTA1} \simeq 50\mu A \tag{2.15}$$

An additional current of  $2\mu A$  is necessary to bias the common mode feedback networks, for an overall current consumption of the LNA of  $102\mu A$ .

The common mode feedback network of the first stage was previously described, while that of the second stage of OTA1 relies on an additional OTA with one input node connected to  $V_{bias}$ , which sets the reference voltage for the stator bias condition, and the other one connected to the output of the common-mode sense network  $V_{CMS}$ . The latter is implemented by two pairs of cross-coupled PMOS pseudo-resistors. This configuration balances the non-linear variations of the pseudo-resistor resistances that occur in presence of a differential signal, suppressing any related common-mode artifact that would otherwise arise at the output node  $V_{CMS}$  resulting in a drift of the common mode bias. This effect must be avoided since it could also reduce the output swing and compromise the system sensitivity and linearity.



Figure 2.10: Transistor-level implementation of OTA2, adopted for the capacitive amplifier, with its common-mode feedback network.

The stator voltage  $V_{bias}$  should be kept as high as possible to guarantee the maximum sensitivity. Within the 3-V supply, it can be raised up to 2 V, the maximum value allowed by the first stage of OTA1 to keep its tail transistors (M6 and M7) in the saturation region and thus to guarantee its correct biasing.

#### Second amplifier sizing and design criteria

The second amplifying stage, a capacitive amplifier, features a 40-dB gain using an input capacitance of 2.5 pF and a feedback capacitor of 25 fF. Fig. 2.10 shows its internal implementation (OTA2), which is again a two-stage, Miller-compensated topology whose simplified architecture appears in Fig. 2.8.

OTA2 first stage is similar to OTA1, but with the addition of a cascode configuration, implemented by transistors M5-M6. Its role is to reduce the Miller effect across OTA2 input transistors (the Miller effect increases the equivalent capacitance affecting the OTA input nodes, thus lowering the loop-gain and reducing the capacitive amplifier bandwidth). In the proposed topology, instead, the presence of the cascode stage allows to keep the low-pass closed-loop pole of the amplifier at high frequencies without consuming additional current. The same solution could have been adopted also in OTA1 to save power. However, this choice was avoided in this first implementation, to allow tuning the input-pair bias current - from 50  $\mu$ A to 500  $\mu$ A - and the common-mode input voltage - from 1.5 V to 2.0 V -, without pushing OTA1 input transistors into the linear region.

A brief sizing criteria is provided in the following for OTA2. Also in this case we can consider OTA2 as a two-stage miller compensated stage with a dominant pole  $f_L$ , OTA2 (See Fig. 2.8 at 1kHz. In this case the noise requirement is loose, since already an amplification occurred at this point of the chain, and the current of the first stage is set by the bandwidth constraints. For the OTA2, the unity gain frequency was set at 200kHz, a decade over the sensor operating frequency, not to lower the final closed-loop gain. This implies the need for GBWP of OTA2 above 20MHz. In this case, this has been set to 30MHz. With a compensation capacitance of 350fF (including any gate-to-drain parasitics), the bias current of the input stage of OTA2 is:

$$I_{bias,1} \simeq 8\pi n U_t C_{C,OTA2} GBW P_{OTA2} \simeq 12\mu A \tag{2.16}$$

Similarly to the LNA case, also for OTA2 the second pole must be kept at least at twice the gain-bandwidth product. Considering a capacitive load  $C_{load,2}$  lower than 700fF, the second second stage bias current given by:

$$I_{bias,2} \simeq 8\pi C_{load,2} n U_T GBW P_{OTA2} \simeq = 14\mu A \tag{2.17}$$

The common mode feedback network of the second stage of OTA2 is similar to the one adopted in OTA1. It is used to set the capacitive amplifier output DC voltage close to 1.5 V, to maximize the swing. The overall OTA2 current consumption, including also the common mode feedback network is of  $32\mu$ A, for a total current consumption of the amplifying stage of  $134\mu$ A, then  $402\mu$ W from a 3V power supply.

### 2.2.3 Downconversion and filtering

The amplified signal is then demodulated by a passive mixer, driven by a signal with the same phase and frequency of the AC current flowing into the sensor. This solution allows to



Figure 2.11: Schematic view of the  $g_m$ -C filter (a), with transistor-level implementation of the OTAs (b).

downconvert to baseband the magnetic field signal that was modulated at a frequency  $f_d = f_0 - \Delta f$  by the drive current.

Finally, the mixer output is converted to a single-ended signal by a unity gain instrumentation amplifier (a buffer) before being filtered by a  $2^{nd}$  order  $g_m$ -C filter, which is shown in Fig. 2.11a and with a 0dB nominal in-band gain. A selective filter is desirable in order to eliminate the sensor noise around the resonance. Its low-pass frequency can be regulated between 10 and 150 Hz by tuning the  $g_m$ -C filter bias current. To guarantee a full power bandwidth equal to the filter bandwidth, the slew rate of the cell has been boosted by adopting the translinear OTA topology shown in Fig. 2.11b.

In details, all the transistors of the  $g_m$ -C filter OTAs are biased in subthreshold region. Their nominal tail bias current is equal to 40 nA, only a small fraction of which flows through M1 and M3 (approximately 500 pA, since their aspect ratio is 80 times smaller than that of M2 and M4). For small to moderate input signals, the bandwidth of the filter is determined by the ratio of the transconductance of M1, M3 and the geometric mean of the capacitors C1 and C2. The square root of the ratio between C1 and C2 determines also the quality factor of the poles synthesized by the circuit. Their values were set to 60 pF and 15 pF, respectively, to achieve a 50-Hz bandwidth with a quality factor equal to 0.5, low enough to avoid any peaking. In presence of a large input signal, the transistors connected to the negative swing input turn off, letting the whole tail current charge or discharge the output node and thus providing a slew rate higher than the target value  $2\pi \cdot 50$  Hz  $\cdot V_{DD}/2 = 470$  V/s.

### 2.3 Measurements results

Fig. 2.12 shows the micro-photographs of the ASIC die, fabricated in a 0.35- $\mu$ m CMOS process from AustriaMicroSystems (AMS). The ASIC core occupies an active area of 0.48 mm<sup>2</sup>. The used readout chain is encircled in white, and the reader can note the reduced PAD area with respect to the twin implementation (located just above) used in [27]. The small area taken up by the Pierce oscillator is also highlighted. Fig. 2.13 shows the two stacked dies wire-bonded onto a socket carrier, mounted on the biasing PCB board which also brings the drive reference when the Pierce oscillator is not used.



Figure 2.12: ASIC die microphotograph with the highlighted circuit blocks. Note the reduced PAD area to minimize parasitic capacitances.

### 2.3.1 Sensitivity and bandwidth

The tests described in this Section were performed using the external drive reference. Fig. 2.14 shows experimental data, where the output voltage is plotted as a function of the input magnetic field, generated with the Helmoltz coil setup.

Measurements were performed biasing the MEMS stators with 2 V and driving the sensor with two different driving currents, to achieve different sensitivity values. The low sensitivity setting, using a driving current  $i_0 = 58 \ \mu A = 41 \ \mu A_{rms}$ , was chosen to span a maximum fullscale range of  $\pm 2.4 \ m$ T, while a high sensitivity configuration, using a driving current of 152  $\mu A = 107 \ \mu A_{rms}$ , was chosen to reach a better resolution, with a lower FSR. The resulting z-axis system sensitivities are found to be 510 mV/mT and 230 mV/mT, respectively. In both configurations, the maximum full-scale range was quoted up to the magnetic field value leading to a 2% linearity error. In both cases the range was limited by the non-linearity of the  $g_m$ -C filter, due to its sub-threshold operation.

Thanks to the off-resonance operating mode, the system bandwidth is set by the cut-off


Figure 2.13: Photography showing the wire boding of the stacked MEMS and ASIC dies.

frequency of the  $g_m$ -C filter. A nominal 50 Hz cut-off can be then guaranteed and tuned between 10 and 150 Hz, avoiding the limit imposed by the intrinsic sensor bandwidth ( $\approx 20$  Hz for this device) [9].

#### 2.3.2 System resolution and power consumption

Noise measurements were performed, and both the sole ASIC contribution and the overall sensing system (sensor and ASIC) noise performances were evaluated.

To derive the ASIC noise, the AC driving current was switched off and the signal was demodulated 1 kHz before resonance, so that the thermomechanical noise contribution is made negligible; noise was measured through the Allan variance method [33] over a 10 s time interval.

Fig. 2.15 shows the equivalent capacitive white noise due to the analog front-end as a function of its power consumption, which is varied only by changing the current of the first stage of OTA1. Since the electronic noise is mainly due to OTA1 input stage, noise decreases by increasing the bias current (see Eq. 2.11) at the cost of an increased power consumption. For very high current values, however, noise increases again. This occurs because the input capacitance of the differential pair increases once the transistors leave the sub-threshold region to enter the strong inversion region. Taking a nominal bias current of 50  $\mu$ A for the first stage of OTA1, the power consumption of the whole front-end (including also the second amplifier, the mixer, the  $g_m$ -C filter and their references) is equal to 450  $\mu$ W, with an input-referred capacitive noise of 30 zF/ $\sqrt{\text{Hz}}$  (see the point marked in orange in Fig. 2.15).

Fig. 2.16 shows the noise performance of the overall system (MEMS and ASIC), evaluated in terms of capacitive noise spectral density as a function of the mismatch between the drive and the resonance frequency. Here, the power consumption of the ASIC is the nominal value (450  $\mu$ W). As the frequency mismatch is reduced, the MEMS thermomechanical noise contribution rises. At 200-Hz offset from resonance, the noise contribution of the electronics and the one



Figure 2.14: System sensitivities, evaluated for a 200 Hz mismatch, for two values of the driving current.

from the sensor become comparable, resulting into a total equivalent capacitive noise of about  $40 \text{ zF}/\sqrt{\text{Hz}}$ .

The equivalent magnetic field resolution can be obtained by dividing the capacitive noise spectral density by the sensor sensitivity  $S_m$ , assuming a driving current of 107  $\mu A_{rms}$ . Fig. 2.17 shows the input equivalent noise, quoted as rms magnetic field per unit sensing bandwidth  $(nT/\sqrt{Hz})$ , at different frequency mismatches. At frequency mismatches < 200 Hz, the sub-400 nT/ $\sqrt{Hz}$  resolution is constant, limited by the MEMS termo-mechanical noise (the dashed line represents the noise floor as the average of the first four points); at frequency mismatches larger than 200 Hz, the ASIC noise contribution becomes dominant and the resolution begins to degrade.

Fig. 2.18 shows the input-referred Allan standard deviation graph, evaluated at 200 Hz mismatch, using a 107  $\mu$ A<sub>rms</sub> driving current, for different measurement configurations. Solid and dashed curves show the noise performance when the signal is demodulated using an external lock-in amplifier, with AC driving current off and on, respectively. The circle-marker curve represents the Allan deviation using the mixer and the  $g_m$ -C filter, as in standard operation. The 1/f trend is due to the dependency of the sensitivity on the frequency (see Eq. 2.2). Fig. 2.18 clearly shows that both the driving current and the mixer introduce a low frequency noise, i.e. an offset drift component, which anyway never exceeds 2  $\mu$ T, for integration times up to 10 s.

The system power consumption is the sum of both the drive and the electronics currents, being 775  $\mu$ W and 560  $\mu$ W when driving at 107  $\mu$ A<sub>rms</sub> and 41  $\mu$ A<sub>rms</sub>, respectively.



Figure 2.15: Input-referred capacitive noise spectral density of the readout electronics as a function of the analog front-end power consumption.

# 2.3.3 Perspective for driving circuit integration

The integrated circuit discussed so far and the values of power dissipation given above include the magnetometer readout and the rms value of the Lorentz current, but not the power dissipated in the circuit needed to generate the reference current. The following three considerations guide the choice of the architecture of the driving part of the system:

- the used off-resonance operation mode requires the generation of a current at a reference frequency which is different from the device resonance;
- this frequency difference needs possibly to be stable and immune from environmental changes (mostly temperature), as it directly affects the system sensitivity (Eq. 2.2);
- the circuit that delivers the Lorentz current should not critically affect the system power consumption.

Considering the first two bullets above, the best option for the implementation of the driving sub-system appears to be the integration, in the same module of the magnetometer, of a high-Q MEMS resonator (e.g. a sample Tang configuration). The resonator should be designed to operate at the required frequency mismatch with respect to the magnetometer. In this way, the resonator will be affected by temperature variations in the same way as the MEMS magnetometer. Indeed, the Young modulus variation with temperature (typically -30 ppm/K) is usually the dominant source of frequency variation with temperature in MEMS devices. To make a numerical example, a  $\pm 60$  K temperature change around 300 K will nominally cause a



Figure 2.16: Equivalent capacitive noise spectral density of the system measured at different frequency mismatches.

 $\pm 36$  Hz frequency shift for a magnetometer resonating at 20 kHz, and a  $\pm 35.64$  Hz shift for a resonator with a 19.8 kHz frequency (200 Hz). In turn, this implies only a  $\pm 0.18$  % mode-split and sensitivity change (see Eq. 2.2) across the whole temperature range of consumer devices.

Considering the third bullet above and thus the electronics, while few examples of driving circuits for magnetometers implemented with discrete components exist [36], none of them considers power consumption issues. When considering this relevant constraint, a low-power Pierce oscillator topology appears the most suitable one, as it minimizes the number of required stages.

On the same chip of the presented readout circuit, the Pierce oscillator whose transistor-level schematic is shown in Fig. 2.19 was also integrated [37]. The circuit was dimensioned with both capacitances  $C_1$  and  $C_2$  of 9 pF (including the PAD capacitance), according to standard Pierce circuits design guidelines [38]. The degeneration of M1 is used to shift the DC output of the oscillator close to the threshold of the following inverter, in order to have a 50% duty cycle. The circuit was coupled and tested with a Tang resonator, which however (i) is not integrated in the same die of the magnetometer and (ii) is not provided with tuning electrodes for correct tuning of the frequency difference. Therefore complete tests of the magnetometer driven by the Tang-Pierce pair were not yet possible. A preliminary characterization shows the Pierce circuit correctly delivering the required 107  $\mu A_{rms}$  current. The current value is set via a selectable resistive load in series to the low resistance (< 0.6 k $\Omega$ ) of the metal loops. The added current consumption by such a Pierce oscillator is 22  $\mu$ A only (corresponding to 66  $\mu$ W at the adopted



Figure 2.17: Input-referred magnetic field noise spectral density of the system measured at different frequency mismatches.

3-V voltage supply).

It can be thus concluded that the power dissipated in the driving circuit can be made very low (less than 1/10 of the overall consumption). The power dissipation discussed in Section IV.B can be therefore considered representative of the whole magnetic field sensing system.

A design of a Tang resonator with tuning electrodes [39] and positioned in the same die of the magnetometer, to match the intended frequency offset and to further complete the system characterization, represents ongoing work.

# 2.4 Conclusions and perspectives

A novel z-axis magnetic field sensor, operated in off-resonance mode, has been introduced and fully characterized together with a custom integrated readout circuit. The sensor exploits multiple loop to amplify the sensitivity. It was fabricated in an industrial process currently used for 6-axis inertial sensors, without added magnetic or piezoresistive materials, and without any insulating barrier between metal and polysilicon. Within this class of devices, the performance in terms of resolution normalized to the device current consumption overcomes all the z-axis sensors shown in Table 2, also providing bandwidth as large as 50 Hz. The sensor in [12] still shows better resolution performance. However, it adopts an insulating barrier for metal loops, and has a bandwidth inherently limited to 10 Hz only by the -3 dB resonant peak width. The sensor in [35] exploits piezoresistive amplification to improve the resolution, but needs a very large minimum biasing current and has a quite impractical system bandwidth of about 0.2 Hz.

The integrated circuit for Lorentz force MEMS magnetometers readout was designed to keep the electronics noise contributions comparable to the sensor noise. The overall system,

| System           | Max  | Driving   | Device    | Figure                    | MEMS + IC | Notes and                                       |
|------------------|------|-----------|-----------|---------------------------|-----------|-------------------------------------------------|
|                  | BW*  | Current   | Length    | of Merit                  | Power     | $\operatorname{comments}$                       |
|                  | [Hz] | $[\mu A]$ | $[\mu m]$ | $[nT \cdot mA/\sqrt{Hz}]$ | $[\mu W]$ |                                                 |
| Emmerich [10]    | 6    | 1000      | 1300      | $1200^{***}$              | N/A       | AM res. <sup>**</sup>                           |
| Bahreyni [16]    | N/A  | 10000     | 420       | 1900                      | N/A       | $\mathrm{FM}$                                   |
| Kynnarainen [12] | 2.5  | 100       | 2200      | 7                         | N/A       | AM res. <sup>**</sup> multicoil                 |
| Langfelder [13]  | 43   | 250       | 1060      | 520                       | N/A       | AM res.**                                       |
| Li [34]          | 1.9  | 8200      | 1800      | 140                       | N/A       | AM res. <sup>**</sup> $3$ -axis <sup>****</sup> |
| Li [17]          | 50   | 1280      | 370       | 640                       | N/A       | $\mathrm{FM}$                                   |
| Langfelder [21]  | 160  | 50        | 1060      | 170                       | N/A       | AM off-res.                                     |
| Kumar [35]       | 0.2  | 7200      | 830       | 0.02                      | N/A       | AM res. <sup>**</sup> + piezores.               |
| Li [18]          | N/A  | 4000      | 1200      | 80000                     | N/A       | FM micro-leverage                               |
| This work        | 150  | 150       | 1400      | 58                        | 775       | AM off-res. multi-coil                          |

Table 2: Comparison of the presented magnetic field sensing system performance with the state-of-the-art

 $^*$  The final system bandwidth can be selected with proper electronic filtering up to the maximum bandwidth.

\*\* Max BW of AM resonant sensors is calculated as the -3 dB resonant peak width.

\*\*\* FOM inferred from other parameters given in the reference.

\*\*\*\* Performance referred to z-axis component only.

including the Lorentz current and the circuit biasing, has a power dissipation of 775  $\mu$ W. This value increase to 841  $\mu$ W when a Pierce circuit to provide the reference current is also taken into account. Though the use of custom PADs on the ASIC die, the system resolution is still affected by parasitic capacitances, and further strategies for their minimization will be considered in future works. At the same time, a better design of metal masks, which takes into account the observed over-etch, and the use of high-resistance geometries for the links, should result simultaneously in at least 2-fold improved sensitivity and resolution.

Recently, FM Lorentz force magnetometers have been presented [17]. Such systems represent an interesting development of Lorentz force based magnetic sensing which is potentially insensitive to capacitive parasitics and less dependent on process and temperature variations if compared to AM systems operated at resonance. Despite these considerations, off-resonant AM systems such as the one presented in this work provide lower sensitivity on the quality factor variations and exhibit specific sensitivity performance that are an order of magnitude higher than those provided by the current FM technology. Moreover, this work presents a sizing criteria for the entire sensing systems including the readout electronic, giving the chance to explore the trade-offs between all the design metrics of interest such as full scale, input refer noise and bandwidth. These are all key issues that have to be taken into consideration to define the suitability of these systems for low-power consumer electronic applications, and that are still not clearly defined for FM systems.

As far as the future development of AM sensing systems, and of this work in particular concerns, a redesign is suggested to improve the dynamic range. This can be done by adopting

a fully-differential and wide swing gm-C filter topology directly connected to the mixer output. Finally, without changing the circuit architectures, power can be saved by reducing the bandwidth of the second amplifier and by lowering the bias current of the first stage of the LNA just to meet the higher noise-floor of the sensor, whose performance are now defined.



Figure 2.18: System noise performance, displayed as input-referred Allan standard deviation, evaluated at 200-Hz mismatch, using a  $107-\mu A_{rms}$  driving current.



Figure 2.19: Transistor level view of the Pierce oscillator. Coupled to a Tang resonator, the circuit implements the driving stage at the oscillation frequency of the resonator, delivering the desired AC current through a resistive load.

# Chapter 3

# AMR magntometer

Magnetic sensors gained an increasing attention in the aerospace and automotive markets, but more recently also in the consumer electronics field [40] due to a growing interest in indoor and GPS-free navigation solutions.

Together with Hall effect sensors, Anisotropic Magneto-resistive Sensors (AMR) represent at the moment the largest fraction of sensors adopted in commercial and automotive products to sense magnetic field.

Such devices can vary their resistivity as a function of both the orientation and the magnitude of an external magnetic field. Beside their suitability for the detection of the Earth-like magnetic field magnitude (few Gauss), the success of AMR devices can be understood considering that their fabrication process, which typically relies on the sputtering of a NiFe film, requires few mask levels so that also the final cost is relatively contained and competitive.

AMR sensing systems usually implement a sensing element and a signal conditioning interface. While several commercial products combine the sensor with an Integrated readout and driving interface, there isn't any example in literature that proposes an ASIC design approach and a choice and sizing criteria for its building elements, but rather works that propose discrete component implementations which is certainly useful for the device charaterization, but not interesting for compact and low power portable applications.

The adoption of integrated circuitry for both the readout and any eventual driving circuit latter leads in fact to both mechanical and electrical advantages, a reduction in the form factor, and in costs.



Figure 3.1: Simplified scheme of the magnetoresistive property: The resistance value depends on the angle  $\theta$  between the current flow direction and the material magnetization vector, which depends on the surrounding magnetic field.

This chapter is devoted to a detailed description of the integrated electronics designed for the driving and sensing of a three axis AMR magnetometer. The next section briefly illustrates the motivation of this design, which was developed in parallel with many other activities. Then, the system requirements, the sensor characteristics, the system architecture and the details of the system's circuits are presented.

# 3.1 Motivation

This project was carried on with reference to the Lorentz force based sensing system for the Earth field sensing with two main goals. The first one was to gain knowledge of AMR readout ASIC design issues and criticism, by investigating and defining the fundamental trade-offs of a design which was oriented to the optimal efficiency. In fact, despite some companies already faced such challenges to release commercial AMR SoCs, for reasons, the related documentation is rarely exhaustive in describing the circuit characteristics. This was a chance to face the problem and propose a detailed analysis.

The second goal was to compare the measurement results of the design efforts with that of the Lorentz Force sensing system and thus use the AMR as a benchmark to evaluate the potential of an emerging MEMS technology.

# 3.2 System requirements

The Earth magnetic field sensing applications require in general medium range sensors, with few-mGa resolution and a full scale over 1Ga. The anisotropic magnetoresistive (AMR) sensor is one type that lends itself well to this range of fields and it can sense dc static fields as well as the strength of the field component along a single direction. The transduction of the signal from resistive variation to voltage is provded by a Wheatstone full bridge. The sensitivity of such a structure built with the available sensors is 0.18 mV/V/Ga. To provide a 3D sensing, the system has to be provided with a set of sensors and a dedicated electronic read-out able to detect the components on the three dimensions X,Y and Z. To be compliant with the consumer electronic market, the whole system power consumption must lay below 1mW. In this design, the target was set to  $600\mu$ W from a 1.8V power supply. A 50Hz bandwidth was required to sense both dc fields and field deriving from user's fast movements or modifications of the surrounding environment. The design targets are summarized in Table 3.1.

## 3.3 The sensor

Differently from the Lorentz force based sensors, the AMR is a resistive sensor with a low or moderate resistance value of approximately  $850\Omega$ . Usually, the magnetic field-dependent variation of this resistance never overcome the 1% or 2%. The sensitivity of the sensor is defined as the resistive variation as a function of the external magnetic field component along the hard axis.

| Parameter                             | Value                    |
|---------------------------------------|--------------------------|
| Magnetic field resolution $(B_{min})$ | $\approx 4 \mathrm{mGa}$ |
| Magnetic field full-scale $(B_{max})$ | $\pm 20$                 |
| Dynamic Range $(DR)$                  | $76.5 \mathrm{~dB}$      |
| System Bandwidth $(BW)$               | $50 \mathrm{Hz}$         |
| Sensor Sensitivity $(BW)$             | $0.18 \mathrm{mV/V/Ga}$  |
| Power Budget                          | $200 \mu W/{ m axis}$    |
| Power Supply                          | 1.8V                     |

Table 3.1: Design requirements



Figure 3.2: Scheme of a Barber pole-based AMR element and its resistive characteristic as a function of the angle between the magnetization vector and the average direction of the current flow

## 3.3.1 Operating principle

AMR devices exploit the resistance dependence of a thin NiFe alloy (Permalloy) layer on the angle between the current that flows through it and the layer magnetization (See Fig. 3.1 semplification), which can be described by a magnetization vector  $\bar{M}$ . Since the magnetization exists only if the internal magnetic domains are aligned in the same direction, during fabrication the film is deposited in a strong magnetic field. This field sets the preferred orientation, or easy axis, of the magnetization vector in the Permalloy resistors. In presence of an external magnetic field  $\bar{H}$ ,  $\bar{B}$  in the material, the the magetization vector  $\bar{M}$  is subject to a rotation from its intrinsic resting position which is proportional (ideally linear) to the magnitude of the component of  $\bar{B}$ along the hard axis.

The resistive variation depends on the orientation of  $\overline{M}$  and in particular on the angle  $\theta$  which it form from the current density vector  $\overline{J}$ . As shown in Fig. 3.2, to an angle 45° correspond both the maximally linear and the maximum derivative points of the characteristic, which have 180° periodicity. Their typical bandwidth is in the 1-5 MHz range, in fact the reaction of the magneto-resistive effect is very fast. For the required application, the entire system bandwidth will be much smaller (50Hz) and thus limited by the read-out electronics.



Figure 3.3: AMR wheatstone bridge with Set-Reset coils (a) and an example of Set-Reset offset cancellation technique scheme (b).

#### 3.3.2 Implementation and features

Barber pole metal bars [40] are placed at  $45^{\circ}$  with respect to the initial magnetization of the magneto-resistor. In this way the current is forced to flow at  $45^{\circ}$  with respect to the longest resistor axis (easy axis) ensuring a linear response.

The translation of the magnetic field information is performed by a proper placement of four magneto-resistive elements in a Wheatstone bridge configuration, as illustrated in Fig. 3.4.

To maximize the sensitivity, all the bridge elements are magneto-resistors, with cross coupled barber pole orientation. It is worth noting that each element is surrounded by a coil which is electrically isolated from the resistor. These coils function is to allow an on-demand remagnetization of the sensor along its easy axis. By injecting a high current (hundreds of mAmps) into the coils through a proper driving circuit, it is possible to induce a magnetic field along the sensors easy axis and thus to re-align and enforce the magnetization vector on the same direction. This operation, not only has been shown to improve the sensor sensitivity over the long-term operation, but can also be exploited to perform the correction of the sensor offset through a so called SET-RESET operation. Switching a high current pulse through the coils will create a large magnetic field of 60-100 Gauss which is able to coherently align the magnetic domains and restore the  $\overline{M}$  vector. This process is referred to as flipping the magnetic domains with a set pulse. This flipping action will also take place for a pulse in the opposite direction through this external coil. In this case, with the reset pulse, the magnetization vector will point in the opposite direction along the easy axis. Performing a set and then a reset pulses, measuring the bridge output after each pulse and subtracting the two measurements, it is possible to cancel the offset (which keeps its sign after each pulse) and save the magnetic field information.

#### 3.3.3 Integration

A thin NiFe layer can be patterned in order to make it sensitive to magnetic fields in x or y directions (on plane), but for compass application the 3-axis sensitivity is mandatory. To address this need, the state-of-the-art solution is to use two sensor chips, one assembled in a standard way on the ASIC or package frame to sense x and y axis while the other is vertically assembled on the ASIC/frame, as shown in Fig.3.4. However, this approach limits the possibility of scaling down the device. To overcome this limitations, in this project also an alternative in-



Figure 3.4: Example of planar AMR sensing system integrated with an inertial sensing system. The z-axis sensor is placed in vertical.



Figure 3.5: Simplified scheme of signal conditioning chain devoted to the readout of a single-axis.

plane solution for the z-axis sensor was investigated, but since it is not much relevant to the readout circuit design (which has to be manufactured in a separated die, with a CMOS process) it is not discussed in this chapter.

# 3.4 System overview

Figure 3.5 shows the block diagram of the read-out signal conditioning chain. The fundamental blocks which are required to a prope rsignal conditioning are an amplifier and a analog-to-digital converter. In Fig.3.5 the power supplies of the bridge (VDDB), the amplifier (VDDA, analog) and of the analog to digital converter (VDDD, supplying other eventual digital circuits) are shown as separated to deal with the particular power management that is necessary to meet the efficiency requirements.

#### 3.4.1 Fundamental considerations

#### Analog read-out

Since the resistance of the AMR resistors is about  $850\omega$ , to limit the power consumption, the bridge bias should be either reduced to less than 100mV or the supply voltage (e.g. 1.8V) must be switched. In this approach, the bridge bias is activated only for a  $T_{on}$  time interval, with a duration long enough to read and amplify the signal. The on-period  $T_{on}$  is set by considering



Figure 3.6: Fundamental Wheatstone bridge readout scheme in which the only branch is sensed by means of a common source mirrored amplifier.

two different constraints: i) If the power budget is 170  $\mu W$  per axis, the duty-cycle, i.e. the ratio between  $T_{on}$  and the sampling period  $T_{sample}=10$ ms, has to be lower than 5%, corresponding to a  $T_{on}<500\mu$ s. ii) the front-end amplifier must be fast enough to settle in  $T_{on}$  with a reasonable accuracy. The stage is turned on and off together with the bridge. However, being the amplifier bandwidth  $(BW_{ampl})$  larger than the sampling period  $(f_{sample}=100\text{Hz})$  folding takes place for both bridge and electronic noise. To meet the minimum detectable magnetic field allowed by the sensor net of folding and desired bandwidth, it is necessary not to significantly increase the input referred noise power spectral density from the minimum value fixed by the AMR resistors. Consider the fundamental readout scheme depicetd in Fig 3.6 in which a single branch of a Wheatstone bridge is sensed by a common source amplifier. The amplifier input referred noise can be written as follows:

$$\bar{E}_{n,in}^2 = 2kTR + 4kT\gamma g_m(1+\alpha) \tag{3.1}$$

Where  $g_m$  is the MOS transconductance and  $\alpha$  takes into account the contribution of the load transistor. Assuming a sub-threshold operating regime and to have a transistor noise contribution ten times lower than that of the resistor, we have:

$$2kTR \approx 10 \cdot 4kT\gamma g_m(1+\alpha) \tag{3.2}$$

$$\frac{VDDB}{I_{bridge}} \approx \frac{20n^2 U_T(1+\alpha)}{I_{mos}}$$
(3.3)

and thus the ratio between the bridge current and the circuit bias current is:

$$K_I \approx \frac{I_{mos}}{I_{bridge}} = \frac{20n^2 U_T (1+\alpha)}{VDDB}$$
(3.4)

where n is the sub-threshold slope and  $U_T$  is the thermal voltage. For a 1.8V power supply and a reasonable  $\alpha \approx 0.5$ , the ratio expressed in Eq.(3.4) is approximately equal to 1. This result would suggest that is possible to keep the electronic noise negligible while doubling the power consumption or, on the other hand, doubling the noise PSD (thus increase by a factor  $\sqrt{2}$  the rms noise) but keeping the electronics power consumption negligible. Obviously, this is true only if applied to the fundamental scheme of Fig.3.6 and actually, the circuitry that is necessary to read the signal is more complex, requires more than two transconductors and also other active and passive elements that typically raise  $K_I$ . An optimal design and a clever choice of circuit topologies is the task of the analog designer. The details of the circuit design, noise and efficiency analysis will be discussed later in the chapte but once clarified this issue, it is possible to proceed with the general analysis.

Taking only into account the bridge resistor noise and denoting as NF= $2BW_{amp}/f_{sample}$  [41] the noise folding factor, the minimum rms input referred noise is:

$$V_{min} > \sqrt{4kTR \cdot NF \cdot \frac{f_{sample}}{2}} = \sqrt{4kTR \cdot BW_{amp}}$$
(3.5)

where R is the resistance of each bridge element. To meet an input referred noise of  $0.324 \mu V$  corresponding to a 1mGa field with a sensitivity of 0.18 mV/V/Ga, we have:

$$BW_{amp} < 6.3kHz \tag{3.6}$$

On the other hand, to settle with an error lower than a LSB ( $220\mu V$  at the amplifier output), the  $T_{on}$  period has to be larger than  $1.5/BW_{amp}$ , thus leading to  $T_{on} > 30\mu s$ . In summary,  $T_{on}$  must be chosen within the range:

$$30\mu s < T_{on} < 500\mu s$$
 (3.7)

That is quite an uncomfortably tight window to deal with while still considering eventual process variations. Moreover, this estimate only comprise the resistors noise contribution while, as dicussed above, the electronics would be reasonably raise the system imposing the shrinking of the on-time window to unpractical or at least unsafe values. Once acknowledged this issue, a decision to relax the resolution ambition from 1mGa to a more approachable value of 4mGa was made, and to take a safety margin we have chosen  $BW_{amp} = 30$ kHz and a  $T_{on} = 275 \mu$ s. With similar values the sensor noise contribution alone in a 50Hz system bandwidth (and considering folding phenomena) would be equal to 2.6mGa. Moreover, the virtual headroom for the electronic contribution to power consumption has been increased. This, as we'll see later in this section, offered the opportunity of featuring an auto-zero procedure to clean each measurement from the electronic offset (without necessarily performing the SET-RESET) and to reduce the flicker noise.

#### Analog-to-Digital Conversion

Despite the target resolution of the ADC required for the AMR lays in the upper boundary of the intermediate region and would thus suggest the adoption of a  $\Delta\Sigma$  ADCs, the chosen multiplexed architecture and the need of a power switching operating regime imposed to discard this option and thus raise the challenge of the design of a high resultion and low power SAR ADCs. To support this effort, the consideration that developing proper circuit design techniques can strongly reduce the impact of technological mismatch and pave the way to SAR ADCs with better efficiency than  $\Delta\Sigma$  even at higher resolutions [42].

#### 3.4.2 Architecture

The target of this design is a 3-axis system, thus a 3-channel front-end directly fed by three bridges that,together with their front-end electronics, will be turned on and off periodically (100Hz ODR) by a power management unit. To save area, the analog to digital conversion is performed by multiplexing the three channels to a single ADC, instead of a single ADC perchannel, as shown in figure 3.7. The required 76dB dynamic range of the sensing system would



Figure 3.7: AMR readout ASIC architecture.

suggest a nominal resolution of 14bit (12.4 effective). Such a value lays in a region where SAR ADCs and  $\Delta\Sigma$  ADC converters, which are the two most efficient options for moderate and low conversion rates, overlap. The challenge of designing a SAR ADCs with such a resolution is due to the limit that the technological mismatch impose to the converter accuracy, Sigma delta converters would be appealing, however, due to the multiplexed architecture, this solution has been discarded since it can only operate in a continuous time and oversampling regime. As a consequence, the design challenge represented by a high efficiency 14-bit SAR ADC has been accepted.

#### 3.4.3 Set-Reset feature

In AMR applications, since the bridge offset is typically larger than the desired signal, a SET/RESET technique is often adopted. It consists of measuring the magnetic field with opposite magnetization of the sensors, by applying opposite current pulses  $(0.5A/2\mu s)$  to a pair of auxiliary built-in coils. To limit power consumption, the SET/RESET technique is adopted in this SoC to measure the offset with a frequency much lower than the output data rate (ODR), e.g. 1 Hz for an ODR of 100 Hz. The offset value is then subtracted to all the following samples. After 1s a new SET/RESET procedure updates the offset value, thus making possible to compensate for slow offset variations with a 1-s time constants. The procedure takes an average power consumption of  $1.8\mu W$ , which is negligible with respect to the power budget.

# 3.5 Mixed-signal Front-End

The analog front-end of each channel is composed by a fully-differential auto-zeroing instrumentation amplifier connected to a charge redistribution SAR ADC by means of an analog multiplexer. A customly designed control logic manages the operation timing of the auto-zero



Figure 3.8: AMR readout ASIC architecture providing the serial digital output of the three sensed axes.

while another logic block, designed from a VHDL source code through an automated digital flow supported by Cadence RTL compiler and Encounter, was combined with the analog-to-digital converter to realize the offset correction and provide the serial output of the three channels. In particular, a S/R signal synchronized with the read-out is sent off from the chip to trigger an eventual and external SET/RESETdriving circuit responsible for injecting the proper currents into the sensor's coils.

Both the logic blocks are driven by a 32kHz XTAL clock (Fig 3.8).

#### 3.5.1 Amplifier

The front-end electronics is made of an instrumentation amplifier (INA) with a digitally programmable resistive feedback (see Fig.3.9), and in particular by the ratio between  $R_F$  and  $R_G$  (500 $\Omega$  approximately) featuring a preset gain from a minimum of 400 to a maximum of 900. The maximum gain value is limited by channel saturation due to the bridge offset, which was expected to be of the order of 1mV. On the lower side, instead, the minimum gain value is set to keep the ADC quantization noise (LSB/ $\sqrt{12} \approx 63\mu$ V), when referred to the input terminals, to values less than  $0.5\mu$ V, which corresponds to the safety margin of the signal generated by a 1mGa field.

The two resistors  $R_{CM}$  of  $8k\Omega$  each were placed to reduce the common mode loop gain of the two input operational transconductance amplifiers (OTAs) and relaxing their common mode stability requirements without impairing the differential gain and the noise neither, which are both determined by  $R_G$  (since its value is much smaller). The circuit power consumption is mainly determined by the two input amplifiers, and in particular by their input differential stage. The bias current has been set taking into consideration both the bandwidth and the noise constraints. In order to make the INA equivalent input noise lower or comparable with the bridge resistance contribution we have: As anticipated in the previous section, in order to remove the flicker noise and the electronic offset a correlated double sampling (CDS) technique was adopted. With reference to Fig.3.9, in the SOS (sampling offset) phase, the amplifier inputs are both connected to a reference voltage and the value due to offset and flicker noise is stored on the capacitors  $C_H$  of 20pF. In the COS (correct offset) phase, the amplifier input terminals are



Figure 3.9: Circuit implementation of the amplifying stage of a single axis.



Figure 3.10: Timing diagram of the AMR analog readout control signals.

connected to the bridge and the two capacitors are directly connected to the output differential buffer in order to subtract the sampled offset/flicker noise from the signal. Fig.3.10 shows the timing diagram of the different signals involved in the front-end, where VDDA and VDDB are the supply-voltage of the front-end amplifier and the bridge, respectively, SOS the sample-offset phase and COS the correct-offset phase. At the rising-edge of the sampling signal (SAMPLE), the front-end amplifier is activated and the offset is sampled. After 250 $\mu$ s, also the bridge is powered, the signal is amplified and the offset subtracted. Since the offset sampling and the signal sampling occur a time difference of  $225\mu$ s, the high-pass corner frequency is set to  $1/225\mu \approx 4.4$ kHz.

#### 3.5.2 SAR ADC

The successive-approximation register ADC (SAR ADC) is based on the binary search algorithm and it requires a simple structure based on a sample and hold (S/H), comparator, digital-to-analog converter (DAC) and the successive approximation register (SAR). The DAC is controlled by the SAR logic and its output voltage varies depending on the decision of the comparator, as shown in Figure 3.11. Beside the simple structure, the converter presents a low power consumption as well.



Figure 3.11: Successive-approximation ADC structure and an example of 4-bit analog-to-digital conversion

#### Circuit architecture

For sake of synthesis and since this SAR ADCs was not the only and neither the first to be designed within the framework of this PhD course, for a more detailed description of the SAR ADC working principle, the author invites to refer to the section 4.2 of the Chapter devoted to the neural probing system, or directly to [43] [44] [45] [46]. Differently from the 10-bit design performed in  $0.13\mu m$  CMOS, this design came after two previous design experiences in the same  $0.35\mu$ m chosen for this project and of an 8-bit [29] and a 12-bit SAR ADCs. In these two designs as in the 10-bit  $0.13 \mu m$  CMOS one, the possibility of designing a high-efficiency SAR converter without adopting custom unit capacitance in favor of process design-kit (PDK) capacitor was investigated but, differently from what the case of the 10-bit converter, led to uncomfortable measurement results. Both the design showed an unexpectedly high mismatch together with a strong susceptibility to parasitics of the cooly (polysilicon-oxyde-polysilicon) capacitors, the only available in the process design-kit. The mismatch led to the need of an oversizing of the unit capacitor of the 8-bit converter to meet an effective resolution of 7.3bit (which was still suitable for the application), while led to a much stronger degradation of the effective resolution of the 12-bit converter down to unacceptable values. In this section, the design of an even higher (14-bit) nominal resolution fully-differential SAR ADC in  $0.35\mu$ m for the AMR SoC will be presented till post-layout simulations (at the moment the chip is in fabrication). Each circuit block is described: the capacitive DAC, the successive approximation logic with its switching procedure, and the comparator.

#### Capacitive DAC

The 14-bit nominal resolution required for this design imposed the adoption of a capacitive DAC with a binary weighted with attenuation capacitor (BWA) (See Fig.3.13a) architecture not to deal with the uncomfortable and critical task of wiring up to  $2^{14}$  unit elements that would result from the adoption of a classic binary weighted (CBW, Fig.3.13b) ) capacitive DAC. The BWA topology is intrinsically more susceptible to mismatch and parasitics than the CBW. In fact, as also discussed in Chapter 4, the top-plate parasitics  $C_{par,sub}$  cause a periodic DNL pattern with positive peaks that depend on the amount of parasitic. To reduce the



Figure 3.12: Simplified schematic of a charge redistribution SAR ADC comprising the DAC (which is also the S&H), the comprator and the logic.

impact of mismatch due both linear and radial oxyde gradients, the common centroid disposition shown in Fig.3.14 was adopted in the layout at the price of an increased but not critical area of  $800\mu mX800\mu$ . A similar disposition still requires a complex wiring network to connect the bottom plates of the capacitive banks to their related bit drivers. This would have been critical if the standard PDK capacitors were adopted due to their anisotropic geometry (which degrade the matching performance) and their susceptibility to parasitics. For this reason, a custom unit capacitor structure was developed exploiting the same polysilicon layer but also the poly-metal parasitics, by the implementation of a sandwich structure. Fig. 3.15 illustrates the implementation of the custom unit elements realized in the 2P4M process. The equivalent specific capacitance  $c_{spec}$  is  $0.8 \text{fF}/\mu m^2$  and it is composed by a 89% poly1-poly2 capacitance and for the remaining 11% by poly-metal and MoM capacitance between the TOP layers and the BOTTOM shell which encodes the TOP. It is worth noting the absolutely isotropic geometry. This feature allows to treat the units as simple *slots*, easing their placement for any desired mapping of the DAC capacitive elements. The presence of a wide surface of metal-3 (ME3), named SHIELD, which covers almost all the structures in the lower layers and should be connected to ground, minimizes the capacitive coupling between the wires that connect the top plates, drawn in ME4, and the bottom wires, drawn in ME2. The shield reduces the top-to-bottom parasitics to negligible values (see Fig.3.16), concentrating almost all the parasitics in the two well known  $C_{par,top}$  and  $C_{par,sub}$ . This is advantageous since the first only produces a gain error, while the second introduces a deterministic non-linearity which can be compensated by a specific array sizing and calibration. Further discussion on the compensation of such non-linearities will be discusses later in the chapter.

The size of the unit capacitor has been chosen high enough to keep the residual statistical mismatch (net of the gradients) on the DNL well below the limit of the missing code. This has been made by imposing the value of the worst-case DNL standard deviation, which occurs at



Figure 3.13: 14-bit CBW a) and BWA b) DAC array topology.

the mid-code [45], to be three times smaller than 0.5:

$$3\sigma_{DNL,max} < 0.5\tag{3.8}$$

The value of  $\sigma_{DNL,max}$  can be expressed as:

$$\sigma_{DNL,max} \approx \alpha \cdot \frac{\sigma_C}{C_u} = \alpha k_c \cdot \sqrt{\frac{c_{spec}}{2C_u}}$$
(3.9)

where  $k_c$  is technological mismatch Pelgrom coefficient of the capacitors (0.5 %· $\mu$ m),  $c_{spec}$  is the specific capacitance and  $\alpha$  is a coefficient which depends on the nominal resolution N and on the switching algorithm. The latter, that will be better described in the next paragraph, was chosen to provide a  $\alpha$  that, according to montecarlo simulations, is equal to:

$$\alpha = 2^{\frac{(3N-4)}{4}}.$$
(3.10)

By substituting 3.10 and 3.9 into 3.11 it is possible to get the expression of the minimum required unit capacitance:

$$C_u >= 18\alpha^2 k_c^2 c_{spec} \approx 190 fF \tag{3.11}$$

To take a safer margin, the final value of the unit capacitor has been oversized by roughly the 30% and set to 250 fF.

#### Switching algorithm

The adopted switching procedure is similar to the monotonic procedure presented in [47] and used in the 10-bit converter described in Chapter 4.23. The only difference is the position



Figure 3.14: Layout of a single branch of the 14-bit BWA array.

of the bottom plate switches of the MSB capacitors during the sampling phase, that are now connected to ground instead of VDD. After the first bit evaluation, the bottom of the MSB capacitor that is connected to the branch whose top-plate stored the smaller sampled voltage, is switched to VDD, determining an initial increase of the common mode voltage. After that, the switching procedure continue like in the monotonic, determining the progressive drop of the common mode voltage till the asymptotic value of VDD/2.

#### Calibration

The capacitive DAC design approach described in the previous paragraph aims to minimize the top-to-bottom parasitics of each capacitive bank but increases the parasitic contribution  $C_{par,sub}$  connected from the top-plate of the sub-DAC to ground. This parasitic, which could reach values up to 1pF, introduces a differential non-linearity characterized by a deterministic and periodic pattern. In particular, with the chosen symmetrical BWA DAC and switching algorithm, the DNL peaks are positive and occur in couples every 128 (2<sup>7</sup>) codes, as shown in Fig.3.17. The same effect occurs if the attenuation capacitor value is the slightly lower than the nominal one,  $C_u$ . It can be demonstrated, in fact, that the height of the DNL peak depends



Figure 3.15: Section showing the custom unit elements implementation and a placement and wiring example.

both on  $C_{par,sub}$  and  $C_u$ :

$$DNL_{peak} \approx \frac{2^{N}C_{par,sub} - 2^{\frac{3N}{2}}(C_{att} - C_{u})}{2^{N+1}C_{u}}$$
(3.12)

Eq. (3.12) shows that the deterministic effect of the top-plate parasitic and that of an increased value of  $C_{att}$  produce the opposite effect and thus the second can compensate the first. On this consideration relies the calibration operating principle. To perform this kind of compensation it has been necessary to control the value of the attenuation capacitor over a certain range of values centered on  $C_u$ . This feature was made possible by placing a number N of *calibration* modules in parallel with  $C_{att}$ , as illustrated in figure 3.18. Each module is controlled by a bit  $D_C$  whose value determines if the internal capacitance  $C_{CM}$  is connected in parallel with  $C_{att}$  or from the sub-DAC top-plate to ground. This control must deal both with process and mismatch variation of  $C_u$  and with a general uncertainty of the effective value of  $C_{par,sub}$ . It is worth noting that if for any manufacturing issue  $C_{att}$  was higher higher than  $C_u$  and than the value required to compensate for  $C_{par,sub}$ , this calibration would be frustrate. In this case, in fact the DNL peaks would already be negative and it would be only possible to add capacitance, thus increasing and not reducing the non-linearity. An accurate parasitic extraction of the whole DAC reported a parasitic value of 588fF. To take a safe margin, the nominal value of  $C_{att}$  was reduced by the 15% with respect to  $C_u$ . The number of calibration modules was set to 64 and the chosen value of  $C_{CM}$  was 1.4fF, able to compensate the non-linearity with a 0.2LSB step. Since only the number of calibration modules connected in parallel to the attenuation capacitor



Figure 3.16: The shield reduces the top-to-bottom parasitics.



Figure 3.17: Detail of the DNL periodic peaks (a) and the related non-linearity of the I/O characteristic (b) due to  $C_{par,sub}$  of 480fF, with a  $C_u$  equal to 250fF.

matters, the control circuit has been implemented through a shift register, as shown in Fig. 3.19. The circuit allows to increase or either decrease the number of modules connected in parallel with  $C_{att}$  in correspondence of any TRIG signal rising edge according to the position of the U/ $\bar{D}$  control bit. This calibration has to be performed at the start-up through an off-chip FPGA control logic. A preset bit CAL allows to configure the converter either in the calibration and in the conversion mode (Fig. 3.20). In the calibration mode (CAL=1) it is possible to drive externally all the DAC bottom plates thanks to a digital multiplexer network integrated in the SAR logic. Providing also TRIG,  $U/\bar{D}$ , a flip-flop RESET signal and the comparator outputs as chip inputs and outputs respectively, it is then possible to perform the calibration procedure,



Figure 3.18: BWA array provided with calibration modules (a) and details of their implementation (b)



Figure 3.19: Calibration modules control circuit (a) and its working principle (b)



Figure 3.20: Details of the implementation of the 14-bit SAR logic and of its connection to the DAC and the calibration bits.

which the author decided not to describe in this document.

#### SAR logic

Figure 3.20 shows the logic circuit implementing the successive approximation algorithm. It features a first row of dynamic flip-flops (DFFs) and a second row of dynamic differential latches (DDLs) similar to those described in Chapter 4. The design goal was to minimizing the capacitive load of the most active signal lines, i.e. the *Valid* signal and the outputs of flip-flops, latches and comparator, in order to reduce the power consumption. Since the chose switching scheme requires the bottom plates of all capacitors to remain at  $V_{DD}$  during the sampling phase except those of the MSB, the differential dynamic latches were implemented into two different versions (DL1,DL2) with opposite polarity. As in the 10-b design described in the chapter realized by a dynamic flip flop with a delayed feedback loop from the output to the reset port



Figure 3.21: Temporizer and the timing diagram of its related signals.

(Fig. 3.21). The feedback delay time was set to guarantee a 250ns time window for each bit evaluation, compatible with a conversion rate higher than 100ksps.

#### Comparator

State-of-the-art efficiency SAR ADC usually rely on comparators that are composed by a dynamic preamplifier followed by a differential latch [48] or even by a single latch [49] [47] whose input is directly connected to the output of the capacitive DAC. Such topologies avoid static power dissipation (apart those due to a negligible leakage current) and their noise-speed trade-off depend on the value of the first stage (or of the only stage) load capacitance. Despite these adavantages, their suitability is mainly restricted to high speed and moderate-low resolution converters. When the required resolution is above 12 bit and especially if the power supply scales down, the value of the LSB shrinks toward values that start to be sensitive to a phenomenon called kickback noise [50]. This consist in the backward propagation of the comparator output during the regeneration and through capacitive coupling between the output and the input nets. In a signal conditioning chain in which all circuit blocks have a low output impedance this wouldn't be a problem but the capacitive DAC of a SAR ADCs has a capacitive output impedance. For this reason the comparator input, in particular of a dynamic implementation, is in principle very susceptible to the capacitive feedthrough from its outputs.

In addition, a dynamic comparator is subject to abrupt variations of the operating conditions of its input MOS pair and of its input capacitance. Since these variations are a function of the gate voltage, they are signal dependent and can cause strong non-linearities.

To deal with these issues, a comparator topology composed by a continuous time preamplifier and a dynamic latch was preferred to a fully dynamic implementation (See Fig.3.22). To further reduce the susceptibility to kickback noise, the first preamplifier features a mirrored architecture (Fig.3.23) providing a gain of 20dB and a bandwidth of 20MHz, which is compatible with the 250ns time slot allocated by the SAR logic temporizer to each bit evaluation and it is able to



Figure 3.22: Two stage comparator composed by a preamplifier and a latch.



Figure 3.23: Transistor level implementation of the comparator.

amplify an input amplitude of the LSB up to 2.2mV, over the second stage latch offset. By setting a preamplifier first-stage tail bias current of  $30\mu$ A the integrated input referred noise has been kept lower than the equivalent quantization noise. To limit the power consumption due to the static power dissipation of the preamplifier, during the conversion the entire comparator works in a power switching regime and it is turned on only during the bit evaluation phase.

# 3.6 Simulation results

The entire AMR SoCs it is currently under fabrication. Fort this reason it is still not possible to report any measurement result, but only post-layout simulation results. In particular, since the ASIC can be described by distinguishing the analog-front-end, the analog-to-digital converter and since the design was oriented toward a top-efficiency target, also simulation results will be presented coherently, illustrating the analog-front-end performance, then the ADCs accuracy and finally discussing the system power anatomy.

## 3.6.1 Analog front-end

Figure 3.24 shows the expected spectrum of the analog front-end output noise. When the CDS is off the amplifier works in a linear time-invariant regime. When the CDS technique is activated, the flicker noise contribution is reduced, while the sampling folds into the 50-Hz band



Figure 3.24: Front-end output noise with and without correlated double sampling technique.

the white noise. The noise due to both the bridge resistors and the front-end electronics results into a 4mGa resolution, while the limited output swing leads to a maximum field range of about  $\pm 5$ Ga. A larger dynamics can be achieved, decreasing the gain, but with lower sensitivity.

#### 3.6.2 ADC

Fig. 3.25a shows the expected DNL and INL of the ADC due to the sub-DAC top-plate parasitic as a function of the output code and of the number of connected calibration modules. When none of the calibration units is connected, the DNL peaks are positive. On the contrary, when all of them are connected, the peaks become negative. With the estimated  $C_{par,sub}$  of 588fF resulting from the post-layout extraction performed in Cadence Assura, a number of 28 connected modules gives the best compensation of the non-linearity. Fig. 3.25b shows the detail of the non-linearity of the input-output characteristic corresponding to 0, 64 and 31 connected calibration modules, respectively. The only quantization associated to the converter when optimally linearized from parasitics but excluding the effect of the mismatch result in a simulated ENoB of 13.7. To estimate the effect of the mismatch, statistical simulation results were performed over 200 runs, showing a mean of 13.4 bit and a standard deviation of 0.11 bit (Figure 3.26. To these effects also the input switch non linearity, the comparator non-linearity and its residual noise and should be added. Due to simulation complexity, it was not possible to perform a transistor level simulation to estimate the dynamic performance of the whole ADC and all the contribution have been evaluated separately and then integrated to estimate the global accuracy. The bootstrapped input switches and the DAC used as a sample and hold showed a worst case linearity equivalent to a 17bit effective resolution. The simulated comparator noise integrated over the entire preamplifier bandwidth is equal to  $50\mu$ V, below the the quantization



Figure 3.25: DNL and INL (a) and a related detail of the I/O characteristic (b) of the ADC with and without calibration.



Figure 3.26: Statistical distribution of the effective number of bit due to parasitic and technological mismatch affecting the DAC.

noise limit. The maximum variation of the comparator input capacitance as function of the input signal is of 230fF, occurs at the full scale range and result in a related INL that does not exceed 1LSB. Figure 3.27 illustrates the simulated power consumption of the ADC at a conversion rate of 300Hz, compatible with a system ODR of 100Hz, it is lower than  $1\mu$ W, thus negligible if compared to the available power budget.

#### 3.6.3 Power anatomy

For an operating regime correspondent to an ODR of 100Hz and a system andwidth of 50hz, the overall power consumption is 0.54mW, resulting into a  $180\mu$ W per axis with the power anatomy depicted in Figure 3.28, compliant with the requirements. Note that the power consumption is approximately evenly divided between the bridge and the electronics. The largest contribution from the electronic side is given by the first stage of the two input OTAs, that represent the 52% of the overall power consumption. The remaining contributions are dominated by the bridge, with the 45% of the overall power.



Figure 3.27: Simulated ADC power consumption at different sampling frequencies.

# 3.7 Conclusions

The project of an ASIC in  $0.35\mu$ m CMOS for the readout of a 3-axis AMR magnetometer was presented. The design and sizing criteria to target an optimized circuit in terms of both accuracy and power consumptions have been applied and discussed. The system includes 3-channels of an analog-front-end featuring auto-zeroing amplifiers that operate in a power switching regime to reduce the power consumption. A diigtal unit fed by a 32kHz XTAL oscillator manages the offset correction and produces the signal that administrate the power management of both the sensor and the front-end. A 14-bit ADC receives the multiplexed output from the three channels. The system power consumption is of  $540\mu$ W,  $180\mu$ W-per-axes at an ODR of 100sps, which guarantees a bandwidth of 50Hz. A synchronization logic signal allows to feature an off-chip driving of the sensor's coils to perform an eventual SET-RESET operation at the rate of 1Hz.





# Chapter 4 Neural probing System-on-Chip

This chapter introduces the topic which was the object of my design effort across the whole vear spent on my master thesis activity and the first year and a half of my PhD. One of the frontiers of the microelectronic technology is represented by brain-machine interfaces and, more in general by the applications in the biomedical field. Wireless multi-channel neural recording systems are highly needed in neuroscience experiments with freely-behaving animals to study the complex brain behavior. They are also critical components for visionary neural prosthetic devices [51]. The required high data rate (>10Mb/s) should be reached with minimum power consumption due to a limited available input power budget in battery-operated systems as well to limit tissue necrosis in neuroprosthetics. For this reason, most of the recently reported systems need on-chip processing of the recorded data to narrow the required bandwidth [28], [29], [52], resulting in a loss of vital information, or limit the transmission range in the mmrange [53], thus being impractical for laboratory experiments. This chapter presents for the first time a fully-integrated 130-nm CMOS 64-channel wireless neural recording system featuring a 20-Mb/s data throughput, 7.5-m TX range with an UWB transmission and an overall power consumption of  $965\mu W$ , corresponding to  $15\mu W$ -per-channel. The latter figure-of-merit is the lowest among published multi-channel wireless devices for neuroscience experiments and it has been reached while keeping the TX range in line with the best-in-class wireless neural recording systems. Most of the design efforts have been spent to maximize the efficiency of the channel amplifier, the A/D converter and UWB transmitter, while still facing the severe issues that the reduction of the power supply imposed in terms of noise, power and transmission range constraints. The transceiver circuit was conceived and designed at University of Padua, by the team of the ICARUS Lab, guided by Professors Neviani and Bevilacqua and it will be briefly illustrated in this chapter. For what concerns the ADC, designed at Politecnico di Milano (as the whole analog fron-end) its optimization required a particular design effort. For this reason, the entire section 4.23 is devoted to its description.

# 4.1 System Architecture and Circuit Implementation

Fig. 4.1(a) shows the overall system architecture. Each channel performs amplification, band-pass filtering and digitization of the incoming signals. The digital data are then serialized by means of parallel in/serial out (PISO) shift registers and sent to a high efficiency UWB transmitter. All the operations are synchronized by a clock manager circuit, composed by a



Figure 4.1: Block scheme of the fully-integrated wireless neural recording SoC.

80-MHz Pierce oscillator and two frequency dividers generating the clock signals for the channel converters (31.25kHz) and for the PISO registers (20MHz). The transmission protocol is managed by a synchronization logic circuit (sync logic in Fig.4.1(a)) that adds periodically a synchronization header to the data stream in order to correctly reconstruct the recorded signals at the receiver side. A power-on-reset (POR) circuit providing a fast system start-up and a bias reference circuit complete the neural recording unit. The circuit includes also a receiver unit (UWB RX in Fig. 4.1(a)) to be used at the remote side to receive the transmitted signal.

The specifications of each single block were derived by taking into account that the typical noise due to electrode and background neural activity is  $10-20\mu$ V in a 10-kHz band [54] and that the maximum signal (both LFP or AP) is 1-2mV, resulting in a SNR close to 40dB. Therefore, to take a conservative margin, the target input noise of the front-end was set to  $5\mu$ V, the ADC resolution to 10 bit and the overall gain variable from 40dB to about 60dB, corresponding to a LSB of  $10\mu$ V and  $1\mu$ V, respectively. Finally, the wireless TX was designed to achieve a transmission range larger than 5m, to cope with hostile lab environments.

To meet these specifications with minimum power dissipation, the supply was lowered to 0.5V but still retaining the optimum performance of the analog blocks. Circuits solutions were combined to get a Power Efficiency Factor (PEF) of the front-end almost 2x better than the state-of-the-art [55]. The ADC was designed to attain a sub-10fJ/cstep efficiency while a transmission range of more than 5m with an energy-per-bit better than 50pJ/b were achieved adopting an UWB transceiver.

#### 4.1.1 Analog-front end

To cope with the low power supply, low-noise and wide swing analog circuit solutions have been investigated. The amplifying chain, as shown in Fig. 4.2, is a cascade of two charge amplifiers providing ac amplification and bandpass filtering from 1Hz to 10kHz, while a 10-bit ADC performs the digitization at 31.25kSps-rate. The first stage is a low-noise amplifier (LNA), ac coupled to reject the offset of the electrode-tissue interface and features a gain of 40dB, set by the  $C_{IN}/C_F$  ratio. A subthreshold PMOS transistors in the feedback path sets a high-pass corner frequency to 0.1Hz. To speed up the amplifier dc stabilization at the start-up, its gate is connected to the output of a power-on reset circuit.

To improve the  $g_m/I$  ratio, the operational transconductance amplifier (OTA) has been implemented with a current-reuse technique. Thick oxyde input transistors were adopted to reduce the gate current, which could cause the stage saturation. Further power and area reduction



Figure 4.2: Detailed schematic of the recording channel with the 2-stage amplifier and the 10-bit binary-weighted with attenuation capacitor SAR ADC.

were obtained using a self-biased common-mode feedback network in the OTA first stage. Working at low supply, the optimization of the voltage swing is essential. To this aim, the second stage of the OTA features a class-AB output and its common mode voltage is precisely kept at half the supply by a highly linear common mode feedback network with four cross-coupled pseudo-resistors (see inset of Fig. 4.2). The configuration cancels non-linearities arising from differential signals, thus suppressing common-mode artifacts and preserving the available headroom.

The second amplifier (PGA) provides an additional gain of 0-20dB depending on an external control. To always keep the channel bandwidth to about 10-kHz, both the input and the Miller-compensation capacitances are switched as a function of the selected gain. Note that the high-pass cut-off frequency of the PGA (1Hz) differs from the high-pass frequency of the LNA (0.1Hz). This choice greatly reduces the input-referred  $1/f^2$  noise contribution due to the LNA pseudo-resistors [29]. Regarding the ADC conversion, the low-power requirements were pursued by adopting a 10-bit fully-differential SAR converter with asynchronous logic, dynamic comparator and monotonic switching algorithm. To cope with the area limitation of each channel, a binary weighted with attenuation (BWA) capacitor DAC was chosen since it is the only topology that makes possible to adopt standard highly-matched 34fF MiM capacitors, instead of sub-fF full-custom capacitors, for the same total array capacitance of 4.28pF [56]. Two bootstrapped switches sample the incoming signal directly at the comparator inputs, allowing the ADC to operate at 0.5V-supply without linearity degradation.

The 64 channels, sampled at 31.25-kSps, are stored in 64 PISO registers. The digitized data are then serialized at a rate of 20MHz (64ch.×10bit/ch.×31.25kHz) and then sent to the transmitter (TX). The time-division multiplexing (TDM) in the digital domain avoids the use of power-hungry line-buffers with a sequential turn-on procedure [54], which can lead to channel cross-talk.



Figure 4.3: Schematic of the proposed 10-bit converter.

# 4.2 The charge redistribution SAR ADC analysis and optimization

In energy-limited applications, such as wireless sensor nodes, implantable medical devices or portable amusements, the adoption of ultra-low power circuits is mandatory in order to extent the system battery lifetime. ADCs featuring moderate sampling rate (0.01 - 1 Msps) and resolution (8-10 bit) are key components in such devices. Among different converter architectures, SAR ADC is the best choice due to its good trade-off among power efficiency, conversion accuracy and design complexity.

In such converters, the primary sources of power consumption are the digital control circuit and the capacitive DAC array. While the digital power consumption benefits from the technology advancement, the power consumption due to the capacitive array is limited by the capacitor mismatch, which is almost technology-independent. For this reason, a great number of DAC topologies and switching algorithms have been proposed in order to reduce DAC power consumption without penalty in terms of accuracy. The latest trend is to rely on the high linearity properties of the conventional binary weighted (CBW) array adopting full-custom unit capacitance in the sub-fF range [48,49,57]. In fact, the minimum value of capacitors supplied by general-purpose design-kits is much larger than necessary to meet the linearity requirements, resulting in a considerably large array capacitance and thus in a high switching power. Thus, this approach requires extra-efforts to design and model the unit capacitance or error correction techniques, thus increasing area and circuit complexity.

In order to investigate the possibility to design a high-efficient SAR converter without adopting custom unit capacitance in favor of more reliable standard process capacitors, in this chapter a binary weighted with attenuation capacitor (BWA) array is proposed and optimized. This topology, even often adopted in literature to reduce the DAC capacitance and thus its power



Figure 4.4: Schematic of a N-bit CBW (a) and of a (m + l)-bit BWA (b) capacitive DAC. Also the stray capacitances affecting the arrays are represented.

consumption, is not considered the best choice to achieve high conversion efficiency due to its larger sensitivity to capacitor mismatch. However, taking into account the typically worse matching properties of custom capacitors, the BWA topology adopting standard MiM capacitors can be considered a valuable alternative to conventional binary-weighted architecture. Thus, the purpose of this work is to demonstrate that a BWA SAR converter can achieve efficiency well below 10 fJ/conversion-step and a remarkable compactness, without requiring the design, modeling and accurate simulations of custom capacitors. Moreover, by applying for the first time to a BWA array an efficient switching procedure as the monotonic switching algorithm proposed in [47], the DAC power consumption is further reduced.

Finally, an asynchronous and fully-differential dynamic logic is proposed to minimize the power consumption of the digital logic.

The proposed 10-bit SAR converter (see Fig. 4.3) has been integrated in 0.13  $-\mu$ m CMOS technology with a power supply ranging from 0.4 to 0.8V. At a nominal supply-voltage of 0.5 V, the ADC achieves an efficiency of 6 fJ/conversion-step, in line with the best conventional binary weighted topologies but adopting as unit element a standard MiM capacitor of 34 fF.

The ADC description is organized as follows. Section 4.2.1 is devoted to a comparison between the traditional CBW and the bridge-capacitor DAC architecture in terms of power consumption and linearity due to statistical mismatch. The role of the parasitic capacitances in the BWA topology is also highlighted. Section 4.2.2 describes in detail the implementation of the switched capacitor network, of the asynchronous dynamic logic and of the dynamic comparator, while measurement results are shown in Section 4.2.3. Finally, conclusions are drawn in Section 5.9.

#### 4.2.1 Comparison between CBW and BWA topologies

The fundamental building blocks of a SA-ADC are the sample-and-hold circuits, the chargeredistribution DAC, the comparator and the digital logic implementing the successive approx-
imation algorithm. A capacitive networks typically serves as both sampling capacitance and feedback DAC, its linearity usually limiting the SAR AD converter performance. The single-ended conventional N-bit binary weighted capacitive array [43] is depicted in Fig. 4.4(a), where  $C_u$  is the unit capacitance.

The main alternative to this structure is the binary weighted with attenuation (or bridge) capacitor array, shown in Fig. 4.4 (b). It features an attenuation capacitor,  $C_{att}$ , in order to divide the array into two binary weighted sub-arrays: a main-DAC and a sub-DAC of m and l capacitors, respectively. In particular, since the use of this topology is often driven by the need of reducing the power consumption, and thus the overall capacitance, we'll refer to the BWA architecture with equal main- and sub-DACs (i.e. m=l=N/2) and  $C_{att} \cong C_u$ , which has been shown to be the most energy efficient among all the possible combinations [45]. In this section, the impact of the capacitance mismatch on the performance of both CBW and BWA arrays will be accurately handled as well the effect of the parasitics that can limit the converter linearity.

### Capacitive mismatch analysis

It's well established that mismatch in capacitive array degrades the overall performance of SAR converters. Although differential nonlinearity (DNL), integral nonlinearity (INL) and effective number of bits (ENOB) are important indicators, ENOB is the best metric of the overall system performance [58]. Moreover, when comparing different converter topologies, the most common figure-of-merit (FOM) [6], defined as

$$FOM = \frac{P_{diss}}{2^{ENOB} \cdot f_{sample}},\tag{4.1}$$

relies on the effective number of bits. However, while a precise formulation of the relationship between capacitive mismatch and ENOB is still lacking, the maximum standard deviation of the DNL ( $\sigma_{DNL,max}$ ) and INL ( $\sigma_{INL,max}$ ) has been analytically derived as function of the unit capacitance relative standard deviation,  $\sigma(\frac{\Delta C}{C_u})$ , for the most common adopted capacitive arrays [45]. Unfortunately, ENOB depends on the distribution of the INL and in particular on the variance of the INL along the output code [58]. Thus, two questions arise: which is the relationship between ENOB and  $\sigma_{DNL,max}$  (or  $\sigma_{INL,max}$ )? And, do the CBW and the BWA topologies feature the same ENOB, once the  $\sigma_{DNL,max}$  has been fixed? In order to answer to these questions, statistical simulations have been carried out on both a CBW and a BWA 10-bit charge-distribution AD converter, assuming that the only contribution to the nonlinearity is due to the capacitive mismatch. The result is shown in Fig. 4.5. For the same maximum standard deviation of ENOB. Moreover, a  $\sigma_{DNL,max}$  lower than 0.5 is enough to limit the average drop of the ENOB to 0.2 bit and its standard deviation to 0.1. This assures that the effective number of bit is always larger than 9.5.

Let us now exploit the trade-off between static nonlinearity and power consumption, the latter being proportional to the overall network capacitance [45] to a first order approximation. Table 4.1 shows the expressions of the overall capacitance, together with  $\sigma_{DNL,max}$  and  $\sigma_{INL,max}$ , as function of  $\sigma(\frac{\Delta C}{C_u})$  [45,59], for a N-bit BWA and CBW single-ended array. The standard deviation of the unit capacitor can be expressed in terms of Pelgrom mismatch coefficient,  $k_c$ , and specific capacitance,  $c_{spec}$ , being

$$\sigma\left(\frac{\Delta C}{C_u}\right) = k_c \cdot \sqrt{\frac{c_{spec}}{2C_u}},\tag{4.2}$$



Figure 4.5: Average (a) and standard deviation (b) of ENOB as function of  $\sigma_{DNL,max}$  for a 10-bit CBW and BWA charge redistribution capacitive DAC.

where the factor 2 takes into account that  $\Delta C$  is referred to a single capacitance with respect to its nominal value. Considering the same number of bits and the same unit capacitance for both the topologies, the single-ended BWA array features an overall capacitance that is approximately a factor  $2^{\frac{N}{2}-1}$  lower than in the CBW architecture. Despite this prospected advantage, the BWA array is more sensitive to mismatch with respect to the CBW array leading to worse nonlinearity performance. As shown in Table 4.1, the effect of mismatch on static nonlinearity is a factor  $2^{\frac{N}{4}}$ larger in the BWA array than in the CBW topology. Since  $\sigma_{DNL}$  is inversely proportional to the square root of the unit capacitance, the same  $\sigma_{DNL,max}$  is achieved in the BWA architecture with a unit capacitance that is a factor  $2^{\frac{N}{2}}$  larger than in the conventional array. In this case, the overall array capacitance of the BWA network is approximately twice that of a conventional

Table 4.1: Comparison of the CBW and BWA arrays performance

|                    | CBW                                                               | BWA                                                                |
|--------------------|-------------------------------------------------------------------|--------------------------------------------------------------------|
| $C_{tot}$          | $2^N \cdot C_u$                                                   | $\left(2\cdot\left(2^{\frac{N}{2}}-1\right)+1\right)\cdot C_u$     |
| $\sigma_{DNL,max}$ | $2^{\frac{N}{2}} \cdot \sigma(\frac{\Delta C}{C_u})$              | $2^{rac{3N}{4}} \cdot \sigmaig(rac{\Delta C}{C_u}ig)$            |
| $\sigma_{INL,max}$ | $2^{\frac{N}{2}-1} \cdot \sigma\left(\frac{\Delta C}{C_u}\right)$ | $2^{\frac{3N}{4}-1} \cdot \sigma\left(\frac{\Delta C}{C_u}\right)$ |

|                                          | $k_c \ [\% \cdot \mu m]$ | $c_{spec} \; [fF/\mu m^2]$ | $k_c^2 c_{spec} \ [\%^2 \cdot fF]$ |
|------------------------------------------|--------------------------|----------------------------|------------------------------------|
| Custom MoM (lateral) [61]                | 4                        | 0.192                      | 3.07                               |
| Custom MoM (1 layer) [62]                | 53                       | 0.12                       | 337                                |
| Custom MoM (2 layers) [62]               | 32                       | 0.24                       | 242                                |
| Custom MoM (lateral) [49]                | 0.5                      | 0.25                       | 0.1                                |
| Standard MiM (130 nM UMC)                | 0.95                     | 1                          | 0.9                                |
| Standard MiM (65 nM ST-Microelectronics) | 0.5                      | 5                          | 1.25                               |

Table 4.2: Capacitance comparison

array, independently on the number of bits.

However, considering the traditional switching algorithm [43] for both the topologies, the energy consumption depends on the output code. In [45] the average switching energy for the CBW and the BWA topologies is analytically derived. When sized to have the same  $\sigma_{DNL,max}$ , the average switching energy consumption of the conventional array is a factor 1.91 lower than in the BWA topology, independently on the number of the converter bits, confirming that the average power is mainly function of the overall array capacitance.

Now, let us suppose to perform the sizing of a 10-bit capacitive array of a SAR ADC in a technology with a mismatch coefficient  $k_c$  of  $1\% \cdot \mu m$  and a specific capacitance of  $1fF/\mu m^2$ . If we assume to size the array to achieve  $3\sigma_{DNL,max} < 0.5$  [57,60], this corresponds to a unit capacitance of about 59 fF for the BWA topology, while this value decreases to 1.8 fF for the conventional array. Moreover, the latest trend is to shrink further the value of the unit capacitance to reduce as much as possible area occupation and power consumption [47–49], relying on the fact that ENOB is not compromised even with larger  $\sigma_{DNL,max}$ , as verified by the statistical simulations shown in Fig. 4.5. Since capacitance smaller than 10 fF are not available among standard design-kit MiM and poly capacitors, the CBW topology requires a custom design of the array capacitors and an extra effort for their characterization, which needs dedicated CAD tools [49]. Thus, a significant work of capacitor modeling is required without producing results accurate and reliable enough to be confidently compared to CMOS industrial standards. Table 4.2 shows a comparison of recent published custom capacitors with MiM capacitances from the 130-nm UMC and 28-nm ST Microelectronics design kit in terms of both Pelgrom coefficient  $k_c$  and specific capacitance  $c_{spec}$ . It's quite evident that custom capacitors can feature significantly worse matching properties with respect to standard MiM capacitances. The only exception is the custom lateral MoM capacitance presented in [57], which shows even better matching properties but having required sophisticated simulation tool to take into account the main cause of mismatch, i.e. the line-edge roughness.

Thus, the unit capacitance in the BWA and CBW arrays that assures the same nonlinearity has to be sized taking into account the different  $k_c$  and  $c_{spec}$  parameters of custom and standard capacitors, leading to a ratio between the overall array capacitances of

$$\frac{C_{tot,BWA}}{C_{tot,CBW}} \cong 2 \frac{\left(k_c^2 c_{spec}\right)_{standard}}{\left(k_c^2 c_{spec}\right)_{custom}}.$$
(4.3)

The product  $k_c^2 c_{spec}$  for all the available custom capacitors as well for the standard MiM capac-

itances is also shown in Table 4.2. The adoption of the custom capacitors proposed in [61, 62] results in a larger array capacitance, and thus in a larger power consumption, of the CBW array than of the BWA network featuring standard MiMs. Only with the custom capacitance design in [49] the CBW array overwhelms the BWA architecture. In addition, Table 4.2 shows that custom capacitors always present a specific capacitance much lower than those achievable with standard MiMs, even by a factor of 10, resulting inefficient in terms of area occupation and making the array layout more critical. These considerations show that there is no an evident advantage for the CBW array, suggesting the possibility to design robust and high efficiency SAR converters without the need of a custom capacitor design, despite the latest design trend.

#### Effect of parasitic capacitances

Let us now discuss the impact of the parasitic capacitances on the linearity performance of the two considered DAC topologies. First of all, it should be noted that the parasitics related to the bottom-plate node of the capacitors do not affect the linearity behavior of the DAC (both CBW and BWA) since this node is always connected to a reference voltage. As far as the parasitic capacitances connected to the capacitor top-plate nodes, it is commonly assumed [45] that they can degrade the linearity performance of the BWA converter, even if constant and voltage independent, while in the conventional array they only cause a gain error without affecting the linearity. This statement can be easily verified expressing the voltage at the output node of the DAC. For the conventional DAC (see Fig. 4.4(a)), the analog output voltage corresponding to a given digital input word ( $D_i$  for i = 1, 2, ..., N) can be expressed as function of the overall DAC capacitance,  $C_{tot} = 2^N C_u$ , and of the parasitic capacitance connected to the top-plate node,  $C_{par}$ 

$$V_{out} = \frac{\sum_{i=1}^{N} D_i \cdot C_i}{C_{tot} + C_{par}} V_{DD},$$
(4.4)

 $C_i$  being equal to  $2^{i-1}C_u$ . As evident from Eq. (4.4), the parasitic capacitance only affects the converter gain.

A similar expression of the DAC output voltage can be derived for the BWA converter in Fig. 4.4(b) with m=l=N/2 and  $C_{att} = C_u$ . By indicating with  $C_{main}$  and  $C_{sub}$  the overall capacitance of the main- and the sub-DAC, respectively, and with  $C_{par,main}$  and  $C_{par,sub}$  the parasitic capacitances at the top-plate node of the corresponding array, the analog output voltage results

$$V_{out} \cong \left[\frac{\sum_{i=\frac{N}{2}+1}^{N} D_i \cdot C_i}{C_{main} + C_{par,main}} + AR \cdot \frac{\sum_{i=1}^{\frac{N}{2}} D_i \cdot C_i}{C_{sub} + C_{par,sub}}\right] V_{DD},$$
(4.5)

 $C_i$  being the capacitance associated to the  $i^{th}$  bit and AR the attenuation ratio

$$AR \cong \frac{C_u}{C_{main} + C_{par,main}}.$$
(4.6)

Equations (4.5) and (4.6) show that only the parasitics  $C_{par,sub}$  affects the linearity, since its effect on the value of the DAC output voltage is not constant for different input signals, while



Figure 4.6: Simulated DNL (a) and INL (b) for a 10-bit BWA DAC featuring  $C_u=100$  fF and  $C_{par,sub}=50$  fF.

 $C_{par,main}$  only causes gain error. In particular,  $C_{par,sub}$  is responsible of a deterministic pattern of the DNL, and hence of the INL. The differential nonlinearity shows a peak every  $2^{\frac{N}{2}}$  codes whose amplitude is

$$DNL_{peak} \cong \frac{\left(2^N - 2^{\frac{N}{2}}\right)C_{par,sub} + C_u}{2^N C_u}.$$
(4.7)

Equation (4.7) highlights that  $C_{par,sub}$  has to be lower than  $C_u$  in order to assure a monotonic behavior of the converter, i.e DNL < 1. In order to verify the accuracy of the proposed analysis, a transistor-level simulation has been performed on a 10-bit single-ended BWA converter with a unit capacitance of 100-fF and a  $C_{sub,par}$  of 50 fF. The simulation results are shown in Fig. 4.6. As expected, the DNL has a periodic pattern featuring a peak every 32, i.e  $2^{\frac{N}{2}}$ , codes of about 0.48 LSB, close to 0.49 as predicted by Eq. (4.7), while the INL is the range -0.5/0.5. The simulated ENOB is 9.85, suggesting that the parasitic capacitance at the top-plate of the sub-DAC can be as large as the unit capacitance without determining a severe ENOB drop.

Also the parasitics between top- and bottom-plate nodes of the main capacitors can severely limit the linearity performance of the converter. These parasitic capacitors act as the mismatch affecting the unit capacitance and they are mainly due to the routing paths connecting the capacitor plates. Considering the CBW array, the parasitics affecting the generic capacitance  $C_i$ , namely  $\Delta C_i$ , introduce an error on the analog output voltage that can be evaluated from Eq. (4.4) being

$$\Delta V_{out} \cong \frac{\sum_{i=1}^{N} D_i \cdot \Delta C_i}{C_{tot}} V_{DD}.$$
(4.8)

As far the BWA array concerns, the error on the output voltage due to parasitics across the main capacitors of both main- and sub-DAC can be evaluated from Eq. (4.5)

$$\Delta V_{out} \cong \left[\frac{\sum_{i=\frac{N}{2}+1}^{N} D_i \cdot \Delta C_i}{C_{main}} + AR \cdot \frac{\sum_{i=1}^{N} D_i \cdot \Delta C_i}{C_{sub}}\right] V_{DD}.$$
(4.9)

Since  $C_{main}$  and  $C_{sub}$  in the BWA array are approximately equal to the overall capacitance of the CBW array,  $C_{tot}$ , when sized to have equal  $\sigma_{DNL,max}$ , a same parasitic capacitance  $\Delta C_i$  in the main-DAC affects the output voltage, and thus the linearity, in the same way as in the CBW array. Indeed, the same parasitic capacitance associated to a sub-DAC element has a lower effect than in the CBW array counterpart. In fact, its effect is attenuated by the bridge capacitor, i.e. by the ratio  $\frac{C_u}{C_{main}} \cong \frac{1}{2^{N/2}}$ . In conclusion, the above analysis shows that the BWA array can achieve linearity per-

In conclusion, the above analysis shows that the BWA array can achieve linearity performance comparable to the conventional array only if the parasitic capacitance affecting the top-plate node of the sub-DAC is kept lower than the unit capacitance.

### 4.2.2 Circuit design

The scheme of the proposed 10-bit AD converter is shown in Fig. 4.3. In order to achieve a better common-mode noise rejection and less distortion, a fully-differential topology has been adopted. This section is devoted to describe the three main sections of the converter, i.e. the capacitive network, the comparator and the asynchronous logic.

### Capacitive array and switching algorithm

In this charge-redistribution based architecture, the capacitor network serves as both sampleand-hold (S/H) circuit and reference DAC capacitor array. Being the converter fully-differential, the capacitive DAC is realized by means of two 10-bit binary weighted arrays with attenuation capacitor, one per branch, with symmetrical main- and sub-DACs. The fully-differential structure allows to reduce  $\sigma_{DNL,max}$  and  $\sigma_{INL,max}$  by a factor of  $\sqrt{2}$  with respect to the single-ended counterpart featuring the same unit capacitance [60], i.e. the same nonlinearity can be obtained with half the unit capacitance.

In order to further reduce the array power consumption, an efficient switching procedure, as the monotonic algorithm [47], has been applied to the capacitive DAC. In fact, the conventional trial-and-error search procedure [43], even if simple and intuitive, is not energy efficient, especially when unsuccessful trials occur. The proposed capacitive array samples the differential input signal directly on the top of the two main-DACs via two bootstrapped switches, which allow low-voltage operation, with the bottom-plates connected to the positive power supply,  $V_{DD}$ . After the switches are turned off, the first comparison is done without switching any array capacitance. According to the comparator output, the largest capacitor on the main-DAC



Figure 4.7: Switching energy versus output code.

corresponding to the positive input signal is switched to ground while the other one remains at  $V_{DD}$ . The ADC repeats the procedure until the LSB is decided. For each cycle, only a capacitor is switched reducing the charge transfer and thus the array power consumption. Note that the proposed BWA array with monotonic switching scheme has the same overall capacitance of the original fully-differential BWA architecture, i.e. about  $2^{\frac{N}{2}+2}C_u$ .

Figure 4.7 shows the array energy consumption as function of the ADC output code for the classical and for the monotonic algorithm applied to a BWA architecture featuring the same unit capacitance and reference voltage. The monotonic switching scheme determines a significant efficiency improvement, reducing the average switching energy from 81.5  $C_u V_{DD}^2$  to roughly 32  $C_u V_{DD}^2$ , i.e. by a factor of 2.5 with respect to the traditional switching approach.

Another advantage of the monotonic switching algorithm is that it reduces the effect of capacitor mismatch on the non-linearity of the converter. It can be verified following the same procedure adopted in [59] that the effect of mismatch on both  $\sigma_{DNL,max}$  and  $\sigma_{INL,max}$  is reduced by a factor  $\sqrt{2}$  with respect to the traditional algorithm. This allows to adopt half the unit capacitance without impairing the linearity of the DAC, thus further reducing the array power consumption.

Based on statistical simulations, the unit capacitance  $C_u$  of the proposed converter was set to 34 fF, close to the technology minimum of 17 fF. Also the attenuation capacitance was set equal to  $C_u$ . Adopting this value for the unit capacitance, the  $\sigma_{DNL,max}$  is expected to be lower than 0.5 LSB, while the total capacitance is 4.28 pF. In order to compensate process gradients, the layout of each branch array was designed to keep symmetrical the functional blocks of the two sub-arrays, as shown in Fig. 4.8. A particular care was dedicated to the minimization of



Figure 4.8: Adopted layout scheme for the capacitive DAC of one branch (D stands for dummy element).

the sub-DAC top-plate parasitic capacitance, which was limited to about 15 fF, according to the results of the parasitic extraction tool.

### **Dynamic Comparator**

A two-stage dynamic comparator, shown in Fig. 4.9, has been employed since it does not consume static current, being suitable for energy efficient design. It consists of a first stage similar to the one adopted in [48] followed by a differential latch. Since the monotonic algorithm makes the common-mode input voltage varying from  $V_{DD}/2$  to 0 along the conversion cycle [47], the input stage features a PMOS differential pair. The operation is determined by the Reset signal, which is generated from the logic circuit (see Fig. 4.3). Before the comparison takes part, the first stage output nodes are pre-charged low by a positive Reset signal. Its falling edge stops the pre-charging phase and starts the amplification of the differential input signal. In fact, as *Reset* becomes low, a current starts to flow into the differential pair charging the parasitic capacitances  $C_p$  at the drain nodes of the input transistors. The voltage on the two capacitors increases at a different speed, the difference depending on the input signal,  $V_{in,p}$ - $V_{in,n}$ . As the first stage output nodes approximate the threshold voltage of the second stage input transistors, the latch starts to amplify the signal until the positive feedback takes over providing a rail-to-rail differential output. Consequently, the Valid signal is pulled high to enable the asynchronous control logic. According to simulation, the first stage differential gain is about 5, high enough to make the noise of the second stage negligible. Thus, the equivalent input noise of the comparator is mainly determined by the input differential pair, its transistor working in weak-inversion region to maximize the comparator efficiency. The equivalent integrated input noise of the comparator is approximately [48]

$$v_{n,rms} \cong \sqrt{\frac{kT}{C_p}} \cdot \sqrt{\frac{8kT}{qV_T}},$$
(4.10)

 $V_T$  being the threshold voltage of the second stage input transistors. In order to make this noise negligible with respect to the LSB for the minimum supply voltage of 0.5 V (i.e.  $1 \text{ V}/2^N \cong 1$ 



Figure 4.9: Schematic of the dynamic comparator.

mV), the parasitic capacitance has to be larger than approximately 3.3 fF. In the proposed comparator, the parasitic capacitance  $C_p$  is about 15 fF.

Another issue related to the proposed converter employing a monotonic switching algorithm is that the common-mode voltage at the comparator input decreases during the conversion cycle. Unfortunately, the dependence of the comparator offset on the common mode input voltage may results in distortion [47,63]. In fact, the offset voltage of the comparator can be expressed as

$$V_{os} \cong \Delta V_T + \frac{V_{GS} - V_T}{2} \left(\frac{\Delta\beta}{\beta}\right),\tag{4.11}$$

where  $\Delta V_T$  is the threshold voltage offset of the differential pair transistors,  $V_{GS} - V_T$  is the effective voltage of the input pair and  $\Delta\beta$  is the overall conductivity mismatch between the input transistors. The first term is a static offset that does not affect the ADC performance while the second term is a dynamic offset that varies with the input signal common-mode voltage, and thus during the conversion cycle, degrading the converter linearity. The simplest way to reduce its effect is to force these transistors to work in subthreshold region and to increase their area, slightly degrading the comparator power consumption performance.

However, another effect has to be taken into account sizing the input transistors of the comparator. When the comparator is turned on by a falling edge of the *Reset* signal, the gate capacitance of the two input transistor becomes signal dependent. For example, considering the first comparison, i.e. when the MSB has to be evaluated, the two input voltage signals can be considerably different, the difference being even equal to the supply voltage. Thus, when the comparator is turned on, two different gate capacitances are applied directly to the top-plate nodes of the main-DACs causing a variation of the differential input voltage. Being signal dependent, this effect causes nonlinearity. Moreover, it is a deterministic effect since it happens also with completely matched input transistors. This is due to the dependence of the input gate capacitance on the applied voltage. This effect has been quantitatively assessed simulating the overall converter with an ideal capacitive array. The simulated INL curves are shown in Fig. 4.10 for three different values of the input transistor dimensions. The INL curve



Figure 4.10: Deterministic effect of the comparator input capacitance on the INL curve.



Figure 4.11: Effect of the comparator on the INL curve considering a mismatch of the aspect ratio between the input transistors.





Figure 4.12: Logic temporizer.

shows a typical S-shape. The effect of the signal-dependent gate capacitance is minimum at the mid-code since the voltage signal is the same on the tap-plate nodes of both the main-DACs. Moreover, this effect is exacerbated for larger transistor sizes, as evident from Fig. 4.10. This suggests to use small input transistors, thus trading off the effects of the dynamic offset and of the signal-dependent gate capacitances at the comparator input terminals. In order to size the input transistors, we performed a set of simulations with a size mismatch between the input devices and for different transistor areas. Fig. 4.11 shows the INL curves obtained employing transistors with  $W = 1\mu m$ ,  $L = 0.2\mu m$  and  $W = 5\mu m$ ,  $L = 1\mu m$ . As the ratio between the two areas is 25, we assumed a mismatch on the transistor width of 5% and 1%, i.e. proportional to the square root of the area, for the small and the large area devices, respectively. In the former case, the converter shows missing codes and a INL peak as large as -0.85. Thus, not-minimum area transistors ( $W = 5\mu m$ ,  $L = 1\mu m$ ) have been chosen as comparator input devices in order to limit the effect of the dynamic offset but without compromising the deterministic effect of the signal-dependent input capacitances.

### Asynchronous Logic

The SAR logic generates the necessary commands to control the comparator and the capacitive DAC. In order to reduce its power consumption, an asynchronous dynamic logic has been designed. By using a dynamic logic, less transistor are needed to implement the same functionality, while being asynchronous it requires only a low-speed sampling clock instead of



Figure 4.13: Timing diagram.



Figure 4.14: Schematic of the asynchronous logic with the details of the dynamic differential latch (DDL) and the dynamic flip-flop (DFF).

an oversampled clock, thereby saving power.

The timing is assured by a logic temporizer implemented using a dynamic latch (TDL) with a delayed feedback loop and shown in Fig. 4.12. Its function is to enable the comparator, wait for its decision and then reset it for a time long enough to assure the settling of the DAC voltage. The timing of the asynchronous logic is briefly described in the following (see Fig. 4.13). At the end of the sampling phase, i.e. at the falling edge of the Sample signal, the comparator evaluates the most significant bit (MSB) and enables the temporizer through the Valid signal. which marks the end of each successful comparison. Only when the bottom-plate nodes of the MSB capacitors are settled, the *Start* signal becomes high and remains in this state till the end of the conversion. Its rising edge triggers the first transition of the TDL output (TQ), causing the reset of the comparator (Reset=1) for a time  $t_d \approx 75$  ns fixed by the delay unit and forcing Valid to zero. At the end of the reset phase, the comparator evaluates the MSB-1 bit and sets the Valid signal high. However, the comparator is not reset till the falling edge of TQ has completed the feedback path, i.e. after a time  $t_d$ . Thus, the Reset signal resembles a square wave with a period of  $2t_d$ . Since the comparator takes approximately  $t_c \approx 10$  ns (see Fig. 4.12) for a decision, a time of  $t_{sett} = 2t_d - t_{comp} \cong 55$  ns is left to the logic circuit and to the array to switch and settle, respectively, during each bit evaluation phase. When also the least significant bit (LSB) has been evaluated, the *EoC* rises and the conversion stops till the end of the successive sampling phase, keeping the comparator in the reset state.

Figure 4.14 shows the logic circuit implementing the successive approximation algorithm. It features a first row of dynamic flip-flops (DFFs) and a second row of dynamic differential latches (DDLs). The design goal was to minimizing the capacitive load of the most active signal lines, i.e. the *Valid* signal and the outputs of flip-flops, latches and comparator, in order to reduce the power consumption. The DFFs have a  $C^2MOS$  structure, with the clock pin connected to 2 n-gates and 2 p-gates. Thus, each DFF output is loaded by 2 n-gates and 3 p-gates, while the DDL outputs are directly connected to a minimum area inverter that drives the corresponding array capacitors. Moreover, since the monotonic algorithm requires the bottom plates of all capacitors to remain at  $V_{DD}$  during the sampling phase, a key choice was to implement the DDLs by means of a differential topology (Fig. 4.14), making possible to use a single set of 10 elements shared by both arrays, instead of 10 elements per each array. In fact, the outputs of the DDLs are kept high till the positive edge trigger performed by the related DFF output, then they switches according to the state of the comparator decision. Finally, the total differential capacitance loading the comparator is equal to 10x2 n-gates (plus 4 n-gates and 4 p-gates due to the XOR gate generating the *Valid* signal).

### 4.2.3 Measurement Results

The ADC has been fabricated using a two-poly-eight-metal (2P8M) 0.13- $\mu$ m CMOS technology featuring 1.11-fF/ $\mu$ m<sup>2</sup> MiM capacitors. The die photograph is shown in Fig. 4.15. The core occupies 188  $\mu$ m x 238  $\mu$ m, while two 1-pF capacitors have been added as decoupling capacitances. The performance of the ADC have been measured at the nominal supply-voltage of 0.5V as well as varying  $V_{DD}$  from 0.4V to 0.8V and for 8 samples of the same wafer. The measurement results are presented below and summarized in Table 4.4.



Figure 4.15: Die photo of the ADC.

### Static performance

The static performance of the ADC in terms of DNL and INL for a supply-voltage of 0.5V are shown in Fig. 4.16. The measured DNL and INL are -0.4/0.5 and -2/2 LSBs, respectively. Since each test chip shows a similar pattern and since the effect of the comparator has been accurately minimized, the DNL/INL performance are mainly due to the stray capacitances induced by the layout routing that has not been carefully optimized. This effect is well captured by accurate post-layout simulations.

### Dynamic performance

Fig. 4.17 shows the output spectra at 0.5-V supply, 200-kSps sampling rate and for an input sinewave at 5.13-kHz and 96.48-kHz frequency, i.e. well below and slightly below the Nyquist frequency. At low frequency, the average measured SNDR and SFDR are 52.6 and 67.5 dB, respectively. The resultant ENOB is 8.45 and its standard deviation is limited to 0.04, considering the 8 tested samples. When the input frequency is increased near the Nyquist rate, the measured SNDR and SFDR drop to 50.8 and 62.1 dB, respectively.

The maximum rate increases with the supply voltage, being 50 kSps at 0.4 V and and 1 MSps at 0.8 V. The SNDR has been measured for different supply voltages at the maximum sampling frequency. The results in terms of average ENOB is scheduled in Table 4.4 showing that the dynamic performance does not vary significantly for supply in the 0.4/0.8-V range.

### Power consumption

The power consumption as function of the sampling rate and for different power supply voltage are shown in Fig. 4.18. Based on post-layout simulations, most of the power consumption (50%) is due to the logic and 35% to the comparator. Only 15% of the power is due to the



Figure 4.16: Measured DNL and INL at 0.5-V supply-voltage.

|                                                   | [49]        | [47]  | [64]  | [60]                 | [48]  | [65]  | This work |      | k    |
|---------------------------------------------------|-------------|-------|-------|----------------------|-------|-------|-----------|------|------|
| Architecture                                      | CBW         | CBW   | BWA   | CBW                  | CBW   | CBW   | BWA       |      |      |
| Technology $(\mu m)$                              | 0.065       | 0.13  | 0.18  | 0.065                | 0.065 | 0.065 | 0.13      |      |      |
| Unit capacitance (fF)                             | 0.25        | 4.8   | 120   | 13.5                 | 0.5   | 0.5   | 34        |      |      |
| Resolution (bit)                                  | 10/12       | 10    | 10    | 10                   | 10    | 8     | 10        |      |      |
| Area $(mm^2)$                                     | 0.076       | 0.052 | 0.24  | 0.19                 | 0.026 | 0.011 | 0.045     |      |      |
| Voltage (V)                                       | 0.6         | 1.2   | 1     | $1/0.4$ $^{\dagger}$ | 1     | 0.6   | 0.4 0.5 0 |      | 0.8  |
| Sampling rate (MSps)                              | 0.04        | 50    | 0.1   | 1                    | 1     | 4.35  | 0.05      | 0.2  | 0.9  |
| ENOB (bit)                                        | 9.4/10.1    | 9.18  | 9.4   | 9.1                  | 8.7   | 7.46  | 8.2       | 8.45 | 8.28 |
| Power consumption $(\mu W)$                       | 0.072/0.097 | 826   | 3.8   | 0.053                | 1.9   | 6.6   | 0.084     | 0.42 | 5.25 |
| FOM (fJ/conversion-step)                          | 2.2/2.7     | 29    | 56    | 94.5                 | 4.4   | 8.38  | 5.7       | 6    | 18.8 |
| ${\rm FOMA}~{\rm (fJ{\cdot}m/conversion{-}step)}$ | 2.57/3.16   | 11.6  | 74.67 | 276.23               | 1.76  | 1.42  | 1.97      | 2.07 | 6.5  |

Table 4.3: Comparison with the state of the art

 $^\dagger$  A dual supply scheme is adopted: AVDD=1V, DVDD=0.4V.



Figure 4.17: Measured spectrum with an input sine-wave at 5.13 kHz and 96.48 kHz for 200-kSps sampling frequency and 0.5-V supply.



Figure 4.18: Measured power consumption for different supply voltages.

array. The measured leakage current at 0.5-V supply is about 15 nA, thus becoming significant only at low sampling rates.

### Performance summary and comparison with the state-of-the-art

To compare the proposed ADC with other works featuring different sampling rate and resolutions, the FOM in Eq. (4.1) is adopted. The FOM of the proposed DAC at the nominal supply of 0.5 V and for a 200-kSps sampling rate is 6 fJ/conversion-step. For a 0.4-V supply and 50-kSps sampling rate it is even lower, being 5.5 fJ/conversion-step, while it increases to 18.8 fJ/conversion-step at 0.8-V supply and 1-MSps sampling rate. Table 4.4 shows that the proposed DAC well compares in terms of efficiency to the best recent published works, without adopting a CBW topology and sub-fF custom capacitors, and even being realized in a older technology. It's worth noting that, since about 80% of the power consumption is due to the digital circuits (asynchronous logic and comparator), the adoption of a more scaled technology would reduce the power consumption improving the efficiency of the converter. If compared to the converters implemented in 0.13- $\mu m$  technology, the proposed ADC is by far the best in term of FOM as shown in Fig. 4.19 that reports the efficiency of published converters as function of the technology node, from 32-nm to 0.5- $\mu m$  feature size.

The proposed ADC favorably compares with the other converters in Table 4.4 in terms of area, with the exception of the work in [48] that adopts high-density custom capacitors and a conventional binary weighted array. The ADC proposed in [60] features a CBW array with 2 standard MIM capacitors connected in series in order to decrease the unit capacitance, but determining a large waste of area and an efficiency that is not so high as in other CBW topology realizations.

In order to take into account both the efficiency and the die area of the converter, a figureof-merit FOMA has been introduced and adopted in literature [66], being defined as

$$FOMA = FOM \cdot \frac{A}{l_{process}},\tag{4.12}$$

where A and  $l_{process}$  are the core area expressed in  $m^2$  and the process minimum length in m, respectively. Even if adopting a BWA architecture and a large unit capacitor, the proposed work favorably compares to the state-of-the-art converters in terms of FOMA.

## 4.2.4 Conclusions

This work presents a high efficient SAR ADC in 130-nm CMOS technology. It has been designed adopting a binary-weighted with attenuation capacitor array featuring a linearity and a total capacitance similar to a conventional binary weighted array but without requiring fullcustom sub-fF capacitors. The design and the layout of the array has been accurately optimized in order to reduce the parasitic capacitance at the top-plate node of the sub-DAC, which degrades the converter linearity. Moreover, an efficient switching scheme has been adopted in order to further reduce its power consumption. Finally, an asynchronous and fully-differential dynamic logic decreases the transistor count minimizing the digital power consumption.

A prototype has been integrated and successfully tested showing an efficiency comparable to the state-of-the art converters even if realized in a less scaled technology, which is still one of the most commonly adopted to implement analog-front end ICs.



Figure 4.19: Measured FOM of state-of-the-art SAR ADCs.



Figure 4.20: Simplified schematic of the DCO with the transformer driving the antenna (a) and measured pulse waveform (b).



Figure 4.21: Simplified schematic of the charge-pump circuit.

### 4.2.5 UWB transmitter

The TX adopts an Impulse-Radio UWB architecture [67] with a pulse-period modulation (PPM) and operates in the 7.25-8.5GHz unlicensed frequency band for UWB communications in Europe, USA and Japan and far from the WiFi and cellular blockers. The transmission occurs in packets formed by a 640-bit synchronization header and a data payload, whose length can be set up to 1024.640 bits resulting in a negligible overhead. Short-pulses are generated by turning on for few ns a 8-GHz digitally-controlled oscillator (DCO) (see Fig. 4.20(a)). This is implemented as a LC-tank oscillator with a NMOS differential pair, which can be tuned thanks to a 4-bit bank of binary weighted MoM capacitances. The DCO operates in a voltage-limited regime with an oscillation amplitude close to  $2V_{DD}=1V$ . The tank inductor is directly coupled to a second inductor. This transformer allows to drive the  $50-\Omega$  antenna enhancing its resistance by a factor of 4. The duration of the pulse, which establishes the bandwidth of the output spectrum, is set by a counter (see Fig. 4.1(a)). Due to the high operation frequency, the counter is powered by a 1.2V-supply generated inside the chip by a fully-integrated charge-pump clocked at 20-MHz. A simplified schematic of the charge-pump is shown in Fig. 4.21. A switched-capacitor topology was adopted with two 10-pF flying capacitors ( $C_1$  and  $C_2$ ) and a storage capacitor,  $C_S=24$  pF. An auxiliary CP with flying capacitor value scaled down by 10x and no storage capacitor is used to turn the switch  $M_s$  fully on. The CP efficiency, estimated by transistor-level simulations, is 75%, with  $84\mu$ W drawn from the 0.5-V supply. The PPM modulation is accomplished by the TX control unit that enables the pulse generation on the first or on the second rising edge of the 80-MHz clock occurring within the symbol period.

## 4.2.6 UWB receiver

The chip also hosts a non-coherent UWB impulse-radio receiver (RX), powered by a 1.2-V supply and featuring a 1aJ per pulse sensitivity (see Fig. 4.1(a)). Both the high TX efficiency and the high RX sensitivity are key choices to build this low-power high-range wireless link. The RX features the cascade of a low-noise and a variable-gain amplifier followed by an energy detector, implemented with a squarer and a windowed integrator, whose output is a voltage sample proportional to the energy of the received pulse.



Figure 4.22: Measured frequency response (a) and input-referred noise (b) for the channel amplifier.

## 4.3 Experimental Results

The 64-channel neural recording SoC has been fabricated in a standard 130-nm CMOS. It occupies an area of  $25mm^2$ , including the pads (see Fig. 4.1(b)) and its overall power consumption is 965 $\mu$ W from 0.5-V supply. Fig. 4.22 shows the measured results related to the recording channel. The full chain has a digitally-controlled gain between 40 and 58dB and an input referred-noise of 5.6 $\mu$ Vrms, in accordance to the system specs. The passband ranges from 1Hz to 10.5kHz enabling to capture both local field potentials and neural spikes. The ADC achieves a 8.45-bit ENoB, a DNL<0.61LSB, an INL<2LSB and a 52.6-dB SNDR for an efficiency of 6fJ/conv. step, which is in line with the performance of state-of-the-art ADCs.Each channel dissipates about  $1\mu$ W (0.93 $\mu$ W and 70nW for the amplifier and for the ADC, respectively), resulting a channel noise efficiency factor (NEF) of 3.11 and a power efficiency factor (PEF) of



Figure 4.23: ADC output spectrum (a) and static non-linearity (b).



Figure 4.24: Measured BER vs distance curve.



Figure 4.25: (a) Neural trace transmitted by the wireless link and (b) comparison between original and reconstructed spike.

|                        | [68]  | [29]  | [54] | [52]          | [55]                   | this work |
|------------------------|-------|-------|------|---------------|------------------------|-----------|
| Technology ( $\mu m$ ) | 0.5   | 0.35  | 0.35 | 0.13          | 0.065                  | 0.13      |
| Channels               | 64    | 64    | 128  | 64            | 4                      | 64        |
| Input noise $(\mu V)$  | 5.1   | 3.1   | 4.9  | 6.5           | 6.5                    | 5.6       |
| resolution (bit)       | 1     | 8     | 9    | 8             | 10                     | 10        |
| TX freq.(GHz)          | 0.433 | 0.433 | 4    | 0.915         | 1.5                    | 8         |
| Modulation             | 2-FSK | 2-FSK | UWB  | FSK           | LSK                    | UWB       |
| Processing             | spike | spike | raw  | FIR filtering | $\operatorname{spike}$ | raw       |
| Data rate (Mbps)       | 0.33  | 1.5   | 90   | 1.5           | 0.8                    | 20        |
| TX range (m)           | 0.13  | 4     | 1    | 10            | 0.001                  | 7.5       |
| Power/ch. ( $\mu$ W)   | 135   | 269   | 47   | 79            | 2.6                    | 15        |

Table 4.4: Comparison of wireless neural recording ICs

4.84, which improves the result of 9.42 in [55] by nearly 2x. The total power consumption of the analog front-end, including clock and reference circuits, is  $490\mu$ W.

Fig. 4.20(b) shows a measured TX pulse. The TX spectrum achieves a -10dB band of 1.1GHz around 8GHz and is compliant to the UWB spectral mask, which limits to -41.3dB/MHz the power spectral density. The TX power consumption at the nominal 20-Mb/s bit-rate is  $470\mu$ W, mostly due to the DCO ( $350\mu$ W). This corresponds to an overall energy consumption of 23.5pJ/bit and to an efficiency of 11.7%, which is the best so far reported among fully-integrated UWB transmitters. The 2.76pJ/b delivered to the external patch antenna enable a transmission up to 7.5m if a BER= $10^{-3}$  is considered, as shown in Fig. 4.24.

Although the most of the circuit blocks were individually optimized and their performance checked out, the main challenge for these systems is to retain consistent performance when the entire system is operating. Therefore, to verify functional and robust operation, a full system test was performed. In this test, pre-recorded biopotential signals were applied at an input of the system setting the amplifier gain at the maximum value (58dB) and placing the receiver at a distance of 3m. Fig. 4.25 shows one reconstructed waveform after demodulating the wireless transmitted data. The comparison between the original trace and the corresponding received waveform shows an excellent quality of the data acquisition and the wireless link. In Table I the proposed SoC is compared to other wireless neural recording systems. The implemented device features the lowest power-consumption per channel, with the exception of [55], which however features only 4 recording channels, a data rate limited to 1Mbit/s and a TX range of only 1mm, thus being impractical for neuroscience experiments with untethered animals.

## 4.4 Conclusions

This chapter presents a 64-channel 0.5-V supply neural recording system-on-chip with 20-Mbps wireless telemetry. The system is able to transmit the recorded neural data at a distance of up to 7.5m with a power per channel of  $15\mu$ W, which represents the lowest figure of merit among wireless neural recording systems for laboratory experiments, without compromising the signal quality and allowing a TX range in line to the best-in-class wireless neural recording systems.

# Chapter 5

# Other Activities

## 5.1 Introduction

As the PhD activity target required a main design effort on several different projects, few time and resources were left to the chance of developing a minor research which, in addition, was not compulsory.

Beside these considerations, the multidisciplinary characteristic of the applications on which the design activity have been focused gave space to some other scientific and technical activities which represented a natural in-depth analysis of some critical issues that emerged along the way.

In this chapter is reported the main side activity which was developed along with the other PhD projects. This was conceived during the design of the neural probing system on chip and it is the realization of a MATLAB based software tool for the assisted design of charge redistribution SAR ADCs, and in particular of capacitive DACs. This activity was actually the integration (and refinement) on a unified platform of several matlab models implemented for the capacitive arrays and the related switching algorithms to the aim of finding efficient design solutions for the 10-bit SAR analog-to-digital converter of the neural probing SoC.

# 5.2 A tool for the assisted design of charge redistribution SAR ADCs

Efficient analog-to-digital converters (ADCs) are essential building blocks of low-power applications, such as wireless sensor nodes, portable biomedical instruments, health monitoring systems, and a wide variety of consumer electronics products that integrate an increasing quantity of sensors. In terms of efficiency, for the moderate speed and resolutions that are typically required by the most of the aforementioned applications, charge redistribution successive approximation register (CR-SAR) converters are the best choice and dominate the ADC market. In the last decade, starting from the Classic Binary Weighted (CBW) SAR ADC [43], other solutions have been proposed to improve the efficiency [48,49] and adopted in various systems [5,69]. Both static and dynamic performance figures of such converters strongly depend on the nonlinearities determined by mismatch and parasitics affecting the capacitive array of the feedback digital-to-analog converter (DAC, see Fig. 5.1). Impact of mismatch on Differential-Non-Linearity (DNL)



Figure 5.1: Generic SAR ADC architecture with a capacitive DAC in the feedback path.

and Integral-Non-Linearity (INL) of the most common array topologies have been studied and formulae are available in literature [45]. However, no quantitative guideline is available to address nonlinearities arising from parasitic capacitances. This effect is deterministic and strongly depends on the array architecture and on the layout quality. Therefore, the parasitics impact on the converter nonlinearities is addressed and minimized relying on transient simulations performed in Electronic Design Automation (EDA) tool, such as Cadence. Unfortunately, such a procedure is extremely time-consuming and requires heavy data post-processing to estimate the Signal-to-Noise-and-Distortion-Ratio (SNDR) and the Equivalent Number of Bits (ENoB). Similar issues arise also in many ultra-low-power designs where sub-10fF unit capacitors are adopted [57] [49] [48]. In this case, the impact of the DAC parasitics on the converter power consumption becomes not negligible and in a traditional EDA tool environment its estimate always relies on transient analyses, thus being time-consuming.

To overcome these limitations, this work proposes a MATLAB-based tool (CSAtool) able to speed-up the simulations needed to estimate the ADC static nonlinearities introduced by the DACs non-idealities, their impact on the converter dynamic performance [70] and also on its power consumption. To the best of the author knowledge, this is the first tool proposed in literature as a valid instrument to assist the design and the analysis of the SAR ADC capacitive array. The tool handles the models of the three most common DAC topologies, namely the Classic Binary Weighted (CBW) [43], the Split Binary Weighted (SBW) [59] and the Binary Weighted with Attenuation Capacitor (BWA) [71] array, both single-ended and fully-differential, using either the conventional switching algorithm [43] or the monotonic scheme [47]. For each of the implemented topologies, the tool models the capacitive array, eventually taking into account the mismatch contribution and/or the parasitics of each capacitance. In this way, CSAtool allows to estimate:

• the impact of the mismatch and parasitics on the static nonlinearity (DNL and INL) with both single and statistical simulations;



Figure 5.2: Schematic of a N-bit CBW array.

- the impact of the mismatch and the parasitics on dynamic nonlinearity (SNDR and ENoB);
- the DAC switching energy, including the parasitics contribution, as a function of the output code.

In SAR ADC design flow performed in a traditional EDA tool environment, such simulations are usually the most time consuming and, for this reason, the adoption of the proposed method can dramatically reduce the overall design time.

The modeling approach and how the converter linearity performance are estimated have already been presented in [72]. Aim of this work is to evaluate the accuracy and the time saving of the proposed tool with respect to the traditional Cadence Spectre simulations on three designed ADCs, one per topology. The switching energy modeling and the related simulation results are also presented and compared to those achieved in Cadence environment. Moreover, measurement results on two fabricated ADCs out of the three designed are reported to appreciate the validity of the tool.

The chapter is organized as follows. Section 5.3 describes the effects of mismatch and parasitic capacitances on the non-linearity metrics (DNL and INL) in the implemented converter topologies. Section 5.4 sketches the tool algorithm based on the evaluation of the A-to-D inputoutput characteristics by means of simple static operations on vectors. The models of the different converter architectures are described in detail in Section 5.5 while Section 5.6 describes the algorithm adopted to compute the switching energy. Section 5.7 shows the typical design flow of a SAR converter, highlighting the advantages of adopting CSAtool with respect to the traditional EDA tool-based approach. Section 5.8 compares CSAtool estimates with the Cadence Spectre simulation and measurement results on three designed and two fabricated ADCs. Finally, conclusions are drawn in Section 5.9.

# 5.3 Converter Topologies

The capacitive network adopted in SAR ADCs can be described as a composition of one or more binary weighted arrays connected to the output node either in parallel or through an attenuation capacitance. During the conversion cycle, the switches configuration (see Fig. 5.1) changes to generate the corresponding output voltage. This voltage marks an input transition level between two adjacent digital codes. Therefore, the mismatch and the parasitics of each capacitance affect the conversion accuracy. In the following, the topologies of the converters adopted for the tool validation are briefly described focusing on capacitive mismatch and parasitic impact on the converter performance.



Figure 5.3: Schematic of a N-bit SBW array.

## 5.3.1 Classic Binary Weighted Array (CBW)

Figure 5.2 shows a simple N-bit capacitive CBW array where each capacitive block is oriented with the bottom plate towards the input voltage reference lines to minimize the parasitics impact. From a formal standpoint, the capacitance of the capacitive block of the array is the binary sum of unit capacitors  $C_u$  (i.e.  $C_i = 2^{i-1}C_u$ ) plus the contribution  $C_{par,i}$  due to the stray capacitances between the top- and the bottom-plate nodes. The parasitic capacitances between the top plates and a reference voltage contribute to  $C_{par,top}$  (see Fig. 5.2), which attenuates the DAC output independently on the code, then causing a gain error without degrading the converter linearity [45]. Also the stray capacitances from the bottom plates and a reference voltage do not contribute to conversion errors since they are directly driven by the SAR logic drivers.

Parasitic capacitances are deterministic, depending on layout inaccuracies, capacitor geometry and wirings. Thus, once the array is designed, their impact on DNL and INL performance can be assessed computing the converter characteristic through circuit simulations. On the contrary, the capacitor mismatch causes a statistical error. Indeed, analytic expressions are available to estimate the maximum standard deviation of DNL and INL [45,59]. In fact, the capacitive mismatch can be modeled assuming a Gaussian probability distribution of the unit capacitor value with a mean equal to the nominal capacitance,  $C_u$ , and a standard deviation of

$$\sigma_C = \frac{k_c C_u}{2A} = k_c \cdot \sqrt{\frac{c_{spec} \cdot C_u}{2}}.$$
(5.1)

 $k_c$ , A and  $c_{spec}$  being the Pelgrom mismatch coefficient, the area and the specific capacitance, respectively. Under this assumption and considering a single-ended CBW array, the maximum DNL standard deviation occurs at the mid-code and is given by [45]

$$\sigma_{DNL,CBW} = 2^{\frac{N}{2}} \cdot \frac{\sigma_C}{C_u},\tag{5.2}$$

The corresponding maximum standard deviation value for the INL is

$$\sigma_{INL,CBW} = 2^{\frac{N}{2}-1} \cdot \frac{\sigma_C}{C_u}.$$
(5.3)

In a fully-differential configuration, these results have to be divided by a factor of  $\sqrt{2}$  [60]. In design practice, the value of the unit capacitor  $C_u$  is set to bring the matching-limited DNL and INL values below the requirements and then the linearity degradation due to parasitic capacitances is assessed by circuit simulations.

### 5.3.2 Split Binary Weighted Array (SBW)

The split DAC topology [44] is shown in Fig. 5.3. It consists of a binary weighted array where the MSB capacitor is implemented by a binary weighted sub-array that perfectly mirrors the structure of the remaining capacitive banks. This DAC topology features an improved switching efficiency and also a reduced impact of the capacitors mismatch. In fact, the maximum standard deviation of the DNL and INL, which still occurs at the mid-code, is a factor of  $\sqrt{2}$  lower than in the CBW array topology [44] being

$$\sigma_{DNL,SBW} = 2^{\frac{N-1}{2}} \cdot \frac{\sigma_C}{C_u}, \qquad (5.4)$$

$$\sigma_{INL,SBW} = 2^{\frac{N-2}{2}} \cdot \frac{\sigma_C}{C_u}.$$
(5.5)

These relations are referred to a single-ended configuration while a fully-differential topology is a further  $\sqrt{2}$  factor less sensitive to mismatch. Moreover, as in the CBW array, only the parasitics connected between top- and bottom-plate nodes of each array capacitor limit the converter linearity.

### 5.3.3 Binary Weighted with Attenuation Capacitor (BWA)

In a single-ended BWA array, the capacitive network is divided into two binary weighted arrays separated by an attenuation capacitor,  $C_{att}$  (see Fig. 5.4) [73]. In this work, we will consider the case where both DACs have the same number of bits (i.e. m=l=N/2) and  $C_{att} = C_u$ . In fact, this topology leads to the most energy efficient solution [45]. It has been shown that the BWA topology is more sensitive than CBW topology to capacitor mismatch when the same unit capacitance is employed. Closed formulae similar to (5.2) and (5.3) are presented in [45] for the single-ended BWA topology. The maximum  $\sigma_{DNL}$  and  $\sigma_{INL}$  set by mismatch are

$$\sigma_{DNL,BWA} = 2^{\frac{3N}{4}} \cdot \frac{\sigma_C}{C_u}, \tag{5.6}$$

$$\sigma_{INL,BWA} = 2^{\frac{3N}{4}-1} \cdot \frac{\sigma_C}{C_u}.$$
(5.7)

These standard deviations are a factor of  $2^{\frac{N}{4}}$  larger than in CBW array. Regarding the impact of the parasitics, in addition to the top-to-bottom plate capacitances also the stray capacitance connected to the top-plate node of the sub-DAC ( $C_{par,sub}$  in Fig. 5.4) affects the linearity since it makes the DAC output voltage depending on the input code. Instead, the parasitic connected to the top plate of the main-DAC,  $C_{par,main}$ , only affects the converter gain [45].

## 5.4 Tool working principle

The proposed MATLAB-based tool computes the input-output characteristic of charge redistribution SAR ADCs adopting the different DAC topologies. The tool does not simply implement the known equations that estimate the converter nonlinearity (i.e. the maximum standard deviation of DNL and INL) but reproduces the circuital behavior of each specific array topology, handling both array parasitics and capacitive mismatch. Differently from Spicelike simulators, which solve ordinary differential equations, CSAtool performs static operations (sums and products) among vectors of capacitances and digital words. This approach allows to



Figure 5.4: Schematic of a N-bit BWA array.



Figure 5.5: CSAtool block diagram.

significantly lighten the computations for circuits, like capacitive DACs, which are composed by simple passive elements and whose accuracy and switching energy can be derived from simple voltage dividers.

Figure 5.5 shows the block diagram of the proposed tool. Once the number of bits, the switching algorithm and the converter topology are fixed, the tool develops the following steps:

- 1. implementation of the DAC capacitance model (capacitance vector,  $\overline{C}$ ), eventually adding the parasitics and/or mismatch contribution;
- 2. evaluation of DAC output vector  $(\overline{DAC_{out}})$ , which includes the DAC output voltages corresponding to all the possible switch configurations depending on the selected switching algorithm;
- 3. extraction of the ADC input-to-output characteristic from the DAC output vector;
- 4. evaluation of the static metrics (DNL and INL);
- 5. evaluation of the dynamic metrics (SNDR and ENoB)

The crucial step of the tool algorithm is the implementation of the capacitance vector,  $\overline{C}$ . From a general standpoint, in any charge redistribution SAR converter, each capacitance bank of the array can be written as the sum of different contributions:

$$C_i = 2^{i-1}C_u + \sum_{j=1}^{2^{i-1}} \delta_j + C_{par,i}, \ i = 1, \dots N,$$
(5.8)

where the first term is its nominal value (expressed as the sum of unit elements) and the term  $\delta_j$  represents the mismatch contribution affecting each of the unit capacitors of  $C_i$ . The mismatch contribution is taken into account considering a Gaussian probability function with zero mean value and a standard deviation  $\sigma_C$  as in (5.1). The term  $C_{par,i}$  is the parasitic capacitance of the  $i^{th}$ -capacitive block, obtained by adding the stray capacitances between the top- and the bottom-plate nodes of each unit element of  $C_i$ .

Once the vector  $\overline{C}$  is known, the next step is to compute the the DAC output vector,  $\overline{DAC_{out}}$ , whose elements are all the voltage transition levels between adjacent codes. In fact, the analogto-digital conversion is performed by comparing the input signal with subsequent voltage levels generated by the capacitive DAC through a binary search algorithm, as shown in Fig. 5.6). The  $\overline{DAC_{out}}$  vector allows to easily compute the ADC input-to-output characteristic (see Fig. 5.5). The DNL as function of the output code is then evaluated by computing the vector  $\overline{\Delta DAC_{out}}$ of the differences  $\Delta DAC_{out}(i)$  between all the adjacent elements of the DAC output vector

$$\Delta DAC_{out}\left(i\right) = DAC_{out}\left(i+1\right) - DAC_{out}\left(i\right) \tag{5.9}$$

as

$$DNL(i) = \frac{\Delta DAC_{out}(i) - \mu \left(\overline{\Delta DAC_{out}}\right)}{\mu \left(\overline{\Delta DAC_{out}}\right)}, i = 0, ..., 2^N - 1$$
(5.10)

where  $\mu\left(\overline{\Delta DAC_{out}}\right)$  is the average of the  $\Delta DAC_{out}(i)$  values. Finally, the INL curve is obtained from the integration of the estimated DNL.

As far as the dynamic metrics (SNDR and ENoB) is concerned, the knowledge of the inputoutput characteristic allows to compute the response of the converter to an input sinewave. The test-bench is schematically depicted in Fig. 5.5. A sinewave with amplitude varying from 1 to 100% of the full-scale range and with an arbitrary frequency is converted into a digital format on the basis of the input-output characteristic. The digital words are then converted in decimal format and the spectrum is computed by applying the Fast Fourier Transform (FFT) in order to derive the dynamic metrics, being the ENoB a function of the peak SNDR. This procedure can be repeated considering different values of the mismatch contribution, randomly chosen in accordance to the Gaussian probability function, allowing to estimate the statistical properties of the considered converter. Indeed, also thermal noise, comparator nonlinearity and aperture time jitter of the sampling clock can limit the dynamic performance of an AD converter. These issues could be taken into account only by complex and time-consuming simulations in EDA tool environments. On the contrary, CSAtool emulates the conversion on the basis of the static input-output characteristic, giving the possibility to estimate the dynamic performance limit imposed by the mismatch and the parasitics of the DAC, which are often the most significant contributions [58].

## 5.5 Capacitive array model

This section is devoted to explain in detail the model implementation and the evaluation of the DAC output vector for the CBW, SBW and BWA converters.

## 5.5.1 CBW Model

In a conventional binary weighted topology, the DAC output voltage at each conversion step can be written as

$$DAC_{out} = FSR \cdot H, \tag{5.11}$$



Figure 5.6: Conversion characteristic for a 3-bit single-ended AD converter. The analog input transition levels are set by the DAC output.

where FSR is the full scale range of the converter and H, as shown in Fig.5.5, is the scalar product

$$H = \frac{1}{C_{tot} + C_{par,top}} \cdot \overline{C} \times \overline{D'}.$$
(5.12)

In (5.12),  $C_{tot}$  is the total capacitance of the array,  $C_{par,top}$  is the parasitic capacitance shown in Fig. 5.2,  $\overline{C}$  is the vector of the array capacitances  $C_i$  and  $\overline{D}$  is the vector of the digital word updated at each conversion cycle:

$$\overline{C} = \begin{bmatrix} C_1 & \dots & C_N \end{bmatrix}, \tag{5.13}$$

$$\overline{D} = \begin{bmatrix} D_1 & \dots & D_N \end{bmatrix}. \tag{5.14}$$

The digital word  $\overline{D}$ , which encodes the DAC output levels at each conversion step, is determined by the adopted switching algorithm. The DAC output vector ( $\overline{DAC_{out}}$  in Fig. 5.5) can be built evaluating (5.11) and (5.12) for all the possible vectors  $\overline{D}$ , which depend on the switching algorithm.

## 5.5.2 SBW Model

The simple model described in the previous section can be extended to the SBW architecture of Fig. 5.3. The MSB capacitor is implemented as a sub-array and the switching scheme differs from the conventional algorithm [44]. Thus, the DAC output voltage can be expressed as

$$DAC_{out,SBW} = FSR \cdot (H_{MSB} + H_{1,MSB-1}), \qquad (5.15)$$

 $H_{MSB}$  and  $H_{1,MSB-1}$  being coefficients related to the MSB and the residual capacitance array, respectively,

$$H_{MSB} = \frac{1}{C_{tot} + C_{par,top}} \times \overline{C_{MSB}} \times \overline{D'_{MSB}}$$
(5.16)

$$H_{1,MSB-1} = \frac{1}{C_{tot} + C_{par,top}} \times \overline{C_{1,MSB-1}} \times \overline{D'_{1,MSB-1}}.$$
(5.17)

Thus, the conversion voltage level is set by two different N-bit words,  $\overline{D_{MSB}}$  and  $\overline{D_{1,MSB-1}}$ , and two vectors of capacitances,  $\overline{C_{MSB}}$  and  $\overline{C_{1,MSB-1}}$ , related to the MSB sub-array and to the residual array, respectively.

### 5.5.3 BWA Model

In the BWA topology, two equal capacitive arrays must be considered: a main-DAC and a sub-DAC, which are related to the MSBs and the LSBs, respectively. Let us indicate as  $C_{tot,main}$  and  $C_{tot,sub}$  the overall capacitances of the main-DAC and of the sub-DAC, and as  $C_{par,main}$  and  $C_{par,sub}$  the parasitic capacitance at the top-plate node of the corresponding DAC (see Fig. 5.4). Due to the presence of the attenuation capacitor,  $C_{att}$ , the sub-DAC contribution to the overall DAC output voltage is reduced by an attenuation factor

$$AR = \frac{C_{att}}{C_{tot,main} + C_{par,main} + C_{att}},$$
(5.18)

 $C_{tot,main}$  being the total capacitance of the ideal main-DAC, while  $C_{par,main}$  is the parasitic capacitance connected to the top-plate node of the main array. Thus, each DAC output in the BWA topology is evaluated as:

$$DAC_{out} = FSR \cdot (H_{main} + AR \cdot H_{sub}), \qquad (5.19)$$

where  $H_{main}$  and  $H_{sub}$  are coefficients related to the main and sub-DAC, respectively,

$$H_{main} = \frac{1}{C_{par,main} + C_{tot,main} + C_{att}} \cdot \overline{C_{main}} \times \overline{D'_{main}},$$
(5.20)

$$H_{sub} = \frac{1}{C_{tot,sub} + C_{par,sub} + C_{att}} \cdot \overline{C_{sub}} \times \overline{D'_{sub}}.$$
(5.21)

In (5.20) and (5.21),  $\overline{C_{main}}, \overline{C_{sub}}, \overline{D_{main}}$  and  $\overline{D_{sub}}$  are the capacitances and digital output code vectors related to the main- and the sub-DAC, being

$$\overline{C_{main}} = \begin{bmatrix} C_{\frac{N}{2}+1} & \dots & C_N \end{bmatrix}$$
(5.22)

$$\overline{C_{sub}} = \begin{bmatrix} C_1 & \dots & C_{\frac{N}{2}} \end{bmatrix}$$
(5.23)

$$\overline{D_{main}} = \begin{bmatrix} D_{\frac{N}{2}+1} & \dots & D_N \end{bmatrix}$$
(5.24)

$$\overline{D_{sub}} = \begin{bmatrix} D_1 & \dots & D_{\frac{N}{2}} \end{bmatrix}.$$
 (5.25)



Figure 5.7:  $4^{th}$  bit evaluation step of a 6-bit CBW converter. The capacitance  $C_4$  is switched to  $V_{DD}$ .

## 5.6 Switching energy computation

The proposed tool also allows to compute the DAC switching energy as a function of the output code for all the handled array topologies. To this aim, the same static approach adopted to compute the DAC output as a function of the digital code is employed. Also in this case, mismatch and parasitic contribution can be taken into account.

For the sake of generality, the approach adopted for the energy estimation is illustrated in the following referring to a 6-bit CBW topology. Figure 5.7 shows a single-ended 6-bit CBW array for a particular configuration of the switches and considering power supply and ground as positive and negative reference voltage, respectively. Each configuration of the switches yields an output voltage, which corresponds to a transition level between two adjacent digital codes, and determines the charge or the discharge of the array capacitances. In particular, Fig. 5.7 shows the switch configuration when the 4<sup>th</sup> bit is evaluated. The energy spent by the power supply can be evaluated considering the charge variation of all the capacitances that are connected to  $V_{DD}$  at the end of the considered conversion step. For the generic  $j^{th}$ -bit evaluation step, the energy absorbed from the power supply is

$$E(j) = \left[C_{j}(V_{DD} - V_{out}(j) + V_{out}(j+1))\right]V_{DD} + \sum_{m} \left[C_{m}(-V_{out}(j) + V_{out}(j+1))\right]V_{DD},$$
(5.26)

where  $V_{out}(j)$  and  $V_{out}(j+1)$  are the output voltages corresponding to the  $j^{th}$ - and  $(j+1)^{th}$ -bit evaluation phase. In (5.26), the first term refers to the  $j^{th}$  capacitance, whose bottom plate is switched from ground to  $V_{DD}$ , while the summation refers to the capacitors whose bottom plate remains at  $V_{DD}$  across the  $(j+1)^{th}$ - and  $j^{th}$ -bit evaluation steps. These capacitances contribute to the energy drawn from power supply because of the variation of the output voltage, that changes from  $V_{out}(j+1)$  to  $V_{out}(j)$ . For the case depicted in Fig. 5.7, only  $C_4$ , which is the switched capacitance, and  $C_1$  contribute to the energy drawn from the power supply and (5.26) reduces to

$$E(4) = \left[C_4 \left(V_{DD} - V_{out} \left(4\right) + V_{out} \left(5\right)\right)\right] V_{DD} + \left[C_1 \left(-V_{out} \left(4\right) + V_{out} \left(5\right)\right)\right] V_{DD},$$
(5.27)

The implemented models compute the switching energy at each step on the basis of the DAC output voltage variation, which can be easily evaluated in CSAtool for each of the  $2^N$  possible output codes. The overall switching energy is then obtained as the sum of the energies spent at all the conversion steps.



Figure 5.8: Typical SAR ADC design flow with a comparison between traditional approach and CSAtool performance.

# 5.7 Design Flow

The typical design flow (see Fig. 5.8) of a SAR ADC adopting a capacitive DAC requires several steps. The first is the design and the simulation of the converter schematic to assure its correct working and its linearity performance. Usually, the topology and the unit capacitance are chosen a-priori to meet the required linearity specs (e.g.  $3\sigma_{DNL} < 0.5$  [57,60]). Once the converter schematic has been established, the layout can be drawn and the stray capacitances extracted with the aid of a parasitic extraction tool. At this point, the same simulations performed on the schematic must be repeated on the post-layout view of the converter to assure that the parasitics do not degrade the linearity performance. To estimate the mismatch effect, also MonteCarlo simulations should be performed at this step. Since rarely the layout is satisfactory at the first attempt and it is hard to analytically predict, and thus minimize, the effect of the parasitics, post layout simulations should be repeated till the linearity requirements are respected.

In particular, to evaluate the static characteristic with the traditional Spice-like simulators, a full-scale ramp is applied to the input of the ADC as shown in Fig. 5.9(a). To reduce the simulation time, behavioral models (Verilog or VerilogA) of the comparator and of the SAR logic circuit are adopted and only the ADC input and output signals are saved. The strobe and the sampling period are set short enough to guarantee at least 100 points per each conversion level, thus keeping the systematic error on the DNL below 1%. On the contrary, in CSAtool the input-output characteristic is directly evaluated on the basis of the DAC output voltage levels by means of static operations among vectors, as shown is Section 5.4. Once the characteristic is given, the static metrics are easily derived.

As far as the dynamic metrics estimation is concerned, in a traditional test-bench based on transient simulation (see Fig. 5.9(b)), the converter has to be fed by an analog sine waveform according to Shannonā $\check{A}\check{Z}s$  law. Its digital output is evaluated over a desired number of samples,



Figure 5.9: Schematics of the a) static and b) dynamic performance evaluation with a traditional Spice-like simulator.

which sets the simulation time. The latter can vary from tens of minutes to few hours. Then, the dynamic metrics, like SNDR and ENoB, can be evaluated exporting the output data to MATLAB in order to perform a FFT. On the contrary, in the CSAtool environment, the output sinewave is directly obtained according to the estimated input-to-output characteristic. The transient simulations adopted in the traditional design flow are time consuming, even if the converter is simplified adopting behavioral models for the comparator and the SAR logic. Instead, by adopting CSAtool, the time needed to compute the dynamic metrics can be significantly reduced, allowing also to perform statistical simulation that, otherwise, would be impractical.

## 5.8 Simulation and Measurements Results

In this section, CSAtool results for three designed SAR ADCs prototypes are shown and compared to both analytical expressions and Cadence simulations in terms of accuracy and computation time. To isolate the DAC contribution to nonlinearities, all Cadence Virtuoso testbenches were created adopting a VerilogA description for the logic circuit and the comparator. The designed prototypes are:

- a 10-bit fully-differential SBW SAR ADC implemented in a 0.35- $\mu$ m CMOS AMS process adopting 23-fF PiP unit capacitors with a specific capacitance of  $0.85 fF/\mu m^2$  and a Pelgrom coefficient of  $0.45\%\cdot\mu$ m;
- an 8-bit CBW SAR ADC designed in a 0.35- $\mu$ m CMOS AMS process employing 80-fF poly-insulator-poly (PiP) unit capacitor with a specific capacitance of  $0.85 f F/\mu m^2$  and a Pelgrom coefficient of 0.45%· $\mu$ m [29];



Figure 5.10: Die photograph of the two measured prototypes adopting a) an 8-bit single-ended CBW and b) a 10-bit fully-differential BWA DAC.



Figure 5.11: Layout of the DAC of the prototyped SBW charge redistribution converter with the detail of the connections between the adopted PiP capacitors.

• a 10-bit BWA SAR ADC featuring a monotonic switching procedure and implemented in a 130-nm CMOS UMC process [74] with 34-fF MiM unit capacitors having a specific capacitance of  $1fF/\mu m^2$  and a Pelgrom coefficient of  $1\%\cdot\mu m$ ;

The last two converters were also implemented as prototypes within the framework of different research projects, while the first one was only designed and layouted. The die micro-photographs of the fabricated converters and the layout view of the CBW prototype are shown in Fig. 5.10 and Fig. 5.11, respectively. For each converter, CSAtool and Cadence post-layout simulations were compared. For the two fabricated converters, measurement results were also compared to simulations. It is noting that a comparison between CSAtool and Cadence simulations is in general enough to validate the proposed tool but, on the other hand, a good matching with fabricated circuit performance represents a further evidence of reliability and accuracy of all the adopted simulation methodologies. This accuracy, with the significant reduction of the simulation time, makes CSAtool a suitable alternative to Cadence Spectre simulations for SAR ADC nonlinearity estimations.



Figure 5.12: Comparison between DNL and INLof the 10-bit SBW ADC prototype estimated by Cadence Spectre simulations (black lines) and by CSAtool (red lines).



Figure 5.13: Comparison between DNL and INL characteristics of the 8-bit CBW ADC prototype estimated by Cadence Spectre simulations (black lines) and by CSAtool (red lines).


Figure 5.14: Measured DNL and INL of the fabricated 8-bit CBW ADC prototype



Figure 5.15: Comparison between DNL and INL characteristics of the 10-bit BWA ADC prototype estimated by Cadence Spectre simulations (black lines) and by CSAtool (red lines).



Figure 5.16: Effect of floating dummy capacitors on the 10-bit BWA array.

#### 5.8.1 Static Metrics

Post-layout simulations for the 10-bit SBW topology performed with Cadence Spectre simulator have been compared to the results of CSAtool. In these simulations, mismatch was not considered and only the parasitic capacitances, extracted from the layout view, were taken into account. Figure 5.12 shows the comparison of the estimated static metrics for the SBW DAC, highlighting excellent matching, since the difference is always less than 0.005 LSB for both the DNL and the INL curves, which is the resolution limit of the test-bench implemented in Cadence.

Similarly, Fig. 5.13 shows the comparison between the DNL and INL characteristics obtained by CSAtool and Cadence Spectre simulations for the 8-bit CBW converter. The matching is excellent with a maximum error below 0.02 LSB, confirming the good accuracy of the implemented converter model. Figure 5.14 show the measured static performance of the 8-bit converter. The DNL curve shows a good matching with both Cadence and CSAtool estimation, while the INL, despite a similar pattern, drifts from the Simulation results for the innermost and the outermost codes. This differences is mainly due to the effect of the comparator non-linearity. Figure 5.15 shows the comparison between Cadence post-layout simulations and CSAtool results for the 10-bit fully differential BWA converter. A difference between the estimated static nonlinearities up to 0.1 and 0.25 LSB for the DNL and the INL, respectively, can be observed. This difference has to be ascribed to the floating dummy capacitors that surround the array to improve the matching property. These floating dummies create a large number of cross-coupled parasitic capacitances, as shown in Fig. 5.16, which are difficult to be identified and modeled, thus making the comparison between Cadence and CSAtool results unfair. However, the discrepancy is drastically reduced as soon as the dummy capacitors are connected to ground or to a reference voltage, with an error always lower than 0.05 LSB. Despite this problem, the measured results on the 10-bit converter shown in Fig. 5.17 show a good matching in terms of DNL with the simulated curves, while the measured INL differs considerably, even if it shows a similar pattern.

Figure 5.18 shows the standard deviation of DNL and INL along the output codes for the 8- and 10-bit converters evaluated by CSAtool and considering the technology capacitive mismatch. This analysis is impractical in Cadence environment since it requires at least 100 static characteristic simulations (see Section 5.7) to achieve confident results. Therefore, the results of CSAtool are compared Table 5.1 to the analytic expressions of DNL and INL maximum standard deviation available in literature [45,59] and reported in Section 5.3. The discrepancy is always lower than 0.005 LSB. It is worth pointing out that for the 10-bit BWA prototype, the



Figure 5.17: Measured DNL and INL of the fabricated 10-bit BWA ADC prototype.



Figure 5.18: Standard deviation of DNL and INL as a function of the output code for the a) 8- and the b) 10-bit SBW and c) BWA converters considering the technology capacitive mismatch.

|                 | $\sigma_{DNL,max}$       |                      | $\sigma_{INL,max}$   |                      |
|-----------------|--------------------------|----------------------|----------------------|----------------------|
| Topology        | $\operatorname{CSAtool}$ | Equation             | CSAtool              | Equation             |
| 8-bit CBW s.e.  | $4.93 \cdot 10^{-3}$     | $5.25 \cdot 10^{-3}$ | $2.51 \cdot 10^{-3}$ | $2.62 \cdot 10^{-3}$ |
| 10-bit SBW f.d. | $9.89 \cdot 10^{-3}$     | $9.74 \cdot 10^{-3}$ | $7.05 \cdot 10^{-3}$ | $6.89 \cdot 10^{-3}$ |
| 10-bit BWA f.d. | $84 \cdot 10^{-3}$       | $81.4 \cdot 10^{-3}$ | $84.4 \cdot 10^{-3}$ | $81.4 \cdot 10^{-3}$ |

Table 5.1: Estimates of  $\sigma_{DNL,max}$  and  $\sigma_{INL,max}$ 

adopted monotonic switching procedure reduces the maximum DNL standard deviation by a factor of 2 with respect to the traditional switching algorithm reported in [45].

#### 5.8.2 Dynamic Metrics

Figure 5.19 shows the SNDR as function of the input signal amplitude (referred to the full scale range), for the three implemented converters, evaluated by means of Cadence Spectre and CSAtool simulations. The effect of mismatch is not taken into account. The maximum discrepancy between Cadence and CSAtool results is always lower than 2dB. However, itâĂŹs worth pointing out that CSAtool allows to easily compute the SNDR vs. input amplitude curve, while the same analysis in a conventional EDA tool environment is time consuming, thus being possible to compute only few points of the dynamic characteristic. This can result in a not correct evaluation of the peak SNDR and thus of the ENoB.

Also a comparison between the measured and the simulated SNDR has been possible for the two fabricated ADCs. The 8-bit CBW converter achieved a measured SNDR of 46 dB, while the 10-bit BWA converter SNDR is 52.6 dB. The correspondent CSAtool estimations are 49 dB $\pm$ 1.7 dB and 56 dB $\pm$ 2 dB respectively, performed over 100 runs. These differences between the simulated values and the measurements were expected due to the presence of comparator nonlinearity, residual noise and dynamic effects which are not considered either in the tool DAC modeling and in the adopted Cadence test-benches. However, the achieved performance is included in a  $2\sigma$ -width interval around the mean value.

## 5.8.3 Switching Energy

Figure 5.20 shows the average switching energies as function of the bit number for the three considered single-ended array topologies evaluated by means of the analytic expressions presented in [45] and here reported:

$$E_{ave,CBW} \cong 0.66 \cdot 2^N \left[ C_u \left( V_{REF,P} - V_{REF,N} \right)^2 \right]$$
(5.28)

$$E_{ave,SBW} \cong 0.41 \cdot 2^N \left[ C_u \left( V_{REF,P} - V_{REF,N} \right)^2 \right]$$
(5.29)

$$E_{ave,BWA} \cong 1.25 \cdot 2^{\frac{N}{2}} \left[ C_u \left( V_{REF,P} - V_{REF,N} \right)^2 \right].$$
 (5.30)

The switching-energy for the fully-differential counterparts can be obtained multiplying the above-mentioned equations by a factor of 2. The average energy evaluated with CSAtool is also



Figure 5.19: Simulated SNDR as a function of the input signal amplitude for the a) 8-bit CBW, b) the 10-bit SBW and c) 10-bit BWA designed converters.



Figure 5.20: Average switching energy for the three converter topologies. Dashed lines refer to analytic equations, symbols refer to CSAtool results.



Figure 5.21: Switching energy as a function of the output code and normalized to the unit element for a fully-differential BWA ADC switched with the a) traditional and b) the monotonic algorithm. Black lines refer to CSAtool results while dashed lines show Cadence simulations.

|                      | Static metrics     |                             | Dynamic metrics   |                  |
|----------------------|--------------------|-----------------------------|-------------------|------------------|
| Converter resolution | CSAtool            | Cadence                     | CSAtool           | Cadence          |
| 8-bit                | $0.087 \mathrm{s}$ | $2.5 \cdot 10^3 \mathrm{s}$ | $1.99 \mathrm{s}$ | $4 \cdot 10^3 s$ |
| 10-bit               | 0.272s             | $10^4 s$                    | 2.66s             | $4 \cdot 10^3 s$ |

Table 5.2: Single simulation time

Table 5.3: MonteCarlo simulation times for 100 runs

| Prototype resolution | Static metrics    | Dynamic metrics  |
|----------------------|-------------------|------------------|
| 8-bit                | $6.491\mathrm{s}$ | 172s             |
| 10-bit               | 25.25s            | $183 \mathrm{s}$ |

reported in Fig. 5.20 and shows a good agreement with the theoretical estimates. As a further evidence of the CSAtool accuracy, the switching-energy as function of the output code has been evaluated by means of transient simulations in Cadence for the 10-bit fully-differential BWA converter, with both the traditional and the monotonic switching algorithm. The simulated energy values, normalized to the unit capacitance, are shown in Fig. 5.21 and compared to CSAtool results. The positive and the negative reference voltages have been considered equal to power supply and ground, respectively. The estimated average energy of the BWA topology employing the traditional switching algorithm is  $81.3C_uV_{DD}^2$  in CSAtool and  $82.74C_uV_{DD}^2$  in Cadence, while for the monotonic switching procedure the estimations are  $30C_uV_{DD}^2$  and  $29.8C_uV_{DD}^2$ , respectively. Similar results can be achieved for all the topologies featured by CSAtool approach, showing that the implemented models are also suitable for the estimation of the DAC switching energy.

Finally, it must be noted that while CSAtool took few seconds to compute the switching energy over all the codes, the simulation by means of Spectre required a parametric simulation lasting more than 3 hours.

### 5.8.4 Simulation Time

Table 5.2 shows a comparison between the simulation times needed to compute the static and dynamic metrics with CSAtool and Cadence environment. The simulation times refer to 8- and 10-bit CBW converters and to a single simulation run. All the simulations were performed with a 3-GHz Pentium Xeon featuring a 4-Gbyte main memory. For the same accuracy (i.e. DNL lower than 1%), CSAtool features an improvement in terms of simulation time up to  $10^4$ . Table 5.3 shows the simulation time for a MonteCarlo analysis of 100 runs performed in CSAtool. For example, in a 10-bit converter, the tool allows to compute the static and dynamic metrics in less than 5 minutes, while the same analysis in Cadence Virtuoso would require more than a week, being impractical.



Figure 5.22: Screenshot of the CSAtool graphic user interface.

For the sake of clarity, it must be pointed out that values related to dynamic metrics and reported in Table III refer to the evaluation of a single point of the dynamic characteristic. However, since the mismatch can significantly vary the SNDR characteristic, at least 10 MonteCarlo simulations have to be performed to capture the peak SNDR, and thus correctly estimate the ENoB. Even in this case, there is a clear advantage of CSAtool with respect to Cadence. In fact, an equivalent analysis performed in the traditional EDA-tool environment would require weeks, since the time needed to estimate a single SNDR value with the same number of samples takes more than 1 hour.

# 5.9 Conclusions

A fast and accurate MATLAB-based tool, named CSAtool, for the analysis and design of capacitive array of SAR ADCs has been presented in this chapter and it was realized as the main side activity of the PhD. It allows to simulate both technology mismatch and parasitics effects on linearity performance and power consumption. The tool relies on static operations among vectors rather than on solving ordinary differential equations, thus greatly reducing the computation time if compared to common integrated circuit design environment. Moreover, it does not require a fine calibration of simulation parameters (time step, strobe period, etc.). CSAtool results show an excellent agreement with the conventional post-layout simulations performed on three designed converters and also a discrete matching with measurement results on two fabricated prototypes. A graphic user interface, shown in Fig. 5.22, eases the handling of the implemented models allowing to set the technology mismatch parameters, to load the parasitic capacitance pattern and to select the desired analysis. The proposed CSAtool can be downloaded with encrypted scripts from ftp://ftp.elet.polimi.it/outgoing/Andrea.Bonfanti/CSAtool.

# Bibliography

- [1] S. Brenna *et al.*, "Fundamental power limits of sar and  $\sigma\delta$  analog-to-digital converters," *Proceedings of the NORCAS*, 2015.
- [2] Editing, "Semiconductor technology nodes: History, trends and forecast," AnySilicon.com, 2013.
- [3] E. Vittoz, "Future of analog in the vlsi environment," IEEE Proceedings of the ISCAS, 1990.
- [4] P. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," IEEE FTFC, vol. 40, no. 6, pp. 1212–1224, June 2005.
- [5] W. S. Liew, X. Zou, and Y. Lian, "A 0.5V 1.13-μW/channel neural recording interface with digital multiplexing scheme," *Proceedings of ESSCIRC*, pp. 219–222, Sept. 2011.
- [6] R. Walden, "Analog-to-digital converter survey and analysis," IEEE J. Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, 1999.
- [7] B. Murmann, "Energy limits in a/d converters," IEEE FTFC, 2013.
- [8] M. A. L. L. Florido and others., "A front-end asic for a 3-d magnetometer for space applications by using anisotropic magnetoresistors," *Magnetics, IEEE Transactions on*, vol. 51, no. 1, Jan. 2015.
- [9] S. Brenna, P. Minotti, A. Bonfanti, G. Laghi, G. Langfelder, A. Longoni, and A. Lacaita, "A low-noise sub-500µW Lorentz force based integrated magnetic field sensing system," in *Micro Electro Mechanical Systems (MEMS)*, 2015 28th IEEE International Conference on, Jan 2015, pp. 932–935.
- [10] H. Emmerich, M. Schofthaler, and U. Knauss, "A novel micromachined magnetic-field sensor," in *Micro Electro Mechanical Systems*, 1999. MEMS '99. Twelfth IEEE International Conference on, Jan 1999, pp. 94–99.
- [11] M. Thompson and D. Horsley, "Parametrically amplified MEMS magnetometer," in Solid-State Sensors, Actuators and Microsystems Conference, 2009. TRANSDUCERS 2009. International, June 2009, pp. 1194–1197.
- [12] J. Kyynarainen, J. Saarilahti, H. Kattelus, A. Karkkainen, T. Meinander, A. Oja, P. Pekko, H. Seppa, M. Suhonen, H. Kuisma, S. Ruotsalainen, and M. Tilli, "A 3D micromechanical compass," *Sensors and Actuators A: Physical*, vol. 142, no. 2, pp. 561–568, 2008.

- [13] G. Langfelder, C. Buffa, A. Frangi, A. Tocchio, E. Lasalandra, and A. Longoni, "Z-Axis Magnetometers for MEMS Inertial Measurement Units Using an Industrial Process," *Industrial Electronics, IEEE Transactions on*, vol. 60, no. 9, pp. 3983–3990, Sept 2013.
- [14] M. Li and D. Horsley, "Offset Suppression in a Micromachined Lorentz Force Magnetic Sensor by Current Chopping," *Microelectromechanical Systems, Journal of*, vol. 23, no. 6, pp. 1477–1484, Dec 2014.
- [15] G. Langfelder and A. Tocchio, "Operation of Lorentz-Force MEMS Magnetometers With a Frequency Offset Between Driving Current and Mechanical Resonance," *Magnetics, IEEE Transactions on*, vol. 50, no. 1, pp. 1–6, Jan 2014.
- [16] B. Bahreyni and C. Shafai, "A Resonant Micromachined Magnetic Field Sensor," Sensors Journal, IEEE, vol. 7, no. 9, pp. 1326–1334, Sept 2007.
- [17] M. Li, S. Sonmezoglu, and D. Horsley, "Extended Bandwidth Lorentz Force Magnetometer Based on Quadrature Frequency Modulation," *Microelectromechanical Systems, Journal of*, 2014.
- [18] M. Li, S. Nitzan, and D. Horsley, "Frequency-Modulated Lorentz Force Magnetometer With Enhanced Sensitivity via Mechanical Amplification," *Electron Device Letters, IEEE*, vol. 36, no. 1, pp. 62–64, Jan 2015.
- [19] C. Acar, A. Schofield, A. Trusov, L. Costlow, and A. Shkel, "Environmentally Robust MEMS Vibratory Gyroscopes for Automotive Applications," *Sensors Journal, IEEE*, vol. 9, no. 12, pp. 1895–1906, Dec 2009.
- [20] G. Langfelder, S. Dellea, A. Berthelot, P. Rey, A. Tocchio, and A. Longoni, "Analysis of Mode-Split Operation in MEMS Based on Piezoresistive Nanogauges," *Microelectromechanical Systems, Journal of*, vol. 24, no. 1, pp. 174–181, Feb 2015.
- [21] G. Langfelder, G. Laghi, P. Minotti, A. Tocchio, and A. Longoni, "Off-Resonance Low-Pressure Operation of Lorentz Force MEMS Magnetometers," *Industrial Electronics, IEEE Transactions on*, vol. 61, no. 12, pp. 7124–7130, Dec 2014.
- [22] M. Zaman, A. Sharma, Z. Hao, and F. Ayazi, "A Mode-Matched Silicon-Yaw Tuning-Fork Gyroscope With Subdegree-Per-Hour Allan Deviation Bias Instability," *Microelectromechanical Systems, Journal of*, vol. 17, no. 6, pp. 1526–1536, Dec 2008.
- [23] B. Simon, A. Trusov, and A. Shkel, "Anti-phase mode isolation in tuning-fork MEMS using a lever coupling design," in *Sensors*, 2012 IEEE, Oct 2012, pp. 1–4.
- [24] M. Li, V. Rouf, M. Thompson, and D. Horsley, "Three-Axis Lorentz-Force Magnetic Sensor for Electronic Compass Applications," *Microelectromechanical Systems, Journal of*, vol. 21, no. 4, pp. 1002–1010, Aug 2012.
- [25] V. Kempe, Inertial MEMS: Principles and Practice. Cambridge University Press, 2011.
- [26] A. Frangi, A. Ghisi, and L. Coronato, "On a deterministic approach for the evaluation of gas damping in inertial MEMS in the free-molecule regime," *Sensors and Actuators A: Physical*, vol. 149, no. 1, pp. 21–28, 2009.

- [27] S. Brenna, P. Minotti, G. Langfelder, A. Bonfanti, A. Longoni, and A. Lacaita, "Low-noise, low-power and extended bandwidth MEMS magnetic field sensing system," in *Proceedings* of the 3rd CSCCA, November 2014, pp. 13–21.
- [28] R. Harrison and C. Charles, "A low-power low-noise CMOS amplifier for neural recording applications," *IEEE J. Solid-State Circuit*, no. 6, pp. 958–965, 2003.
- [29] A. Bonfanti, M. Ceravolo, G. Zambra, R. Gusmeroli, T. Borghi, A. S. Spinelli, and A. L. Lacaita, "A multi-channel low-power IC for neural spike recording with data compression and narrowband 400-MHz MC-FSK wireless transmission," *Proceedings of ESSCIRC*, pp. 330–333, Sept. 2010.
- [30] R. Harrison and C. Charles, "A low-power low-noise CMOS amplifier for neural recording applications," Solid-State Circuits, IEEE Journal of, vol. 38, no. 6, pp. 958–965, June 2003.
- [31] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. New York, NY, USA: McGraw-Hill, Inc., 2001.
- [32]
- [33] M. Vagner, P. Benes, and Z. Havranek, "Experience with Allan variance method for MEMS gyroscope performance characterization," in *Instrumentation and Measurement Technology Conference (I2MTC)*, 2012 IEEE International, May 2012, pp. 1343–1347.
- [34] M. Li, E. Ng, V. Hong, C. Ahn, Y. Yang, T. Kenny, and D. Horsley, "Single-structure 3-axis Lorentz force magnetometer with sub-30 nT/√Hz resolution," in *Micro Electro Mechanical* Systems (MEMS), 2014 IEEE 27th International Conference on, Jan 2014, pp. 80–83.
- [35] V. Kumar, M. Mahdavi, X. Guo, E. Mehdizadeh, and S. Pourkamali, "Ultra sensitive Lorentz force MEMS magnetometer with pico-tesla limit of detection," in *Micro Electro Mechanical Systems (MEMS)*, 2015 28th IEEE International Conference on, Jan 2015, pp. 204–207.
- [36] S. Dominguez-Nicolas, R. Juarez-Aguirre, P. Garcia-Ramirez, and A. Herrera-May, "Signal Conditioning System With a 4-20 mA Output for a Resonant Magnetic Field Sensor Based on MEMS Technology," *Sensors Journal*, *IEEE*, vol. 12, no. 5, pp. 935–942, May 2012.
- [37] A. A. Seshia, W. Low, S. A. Bhave, R. T. Howe, and S. Montague, "Micromechanical pierce oscillator for resonant sensing applications," in *Technical Proceedings of the 2002 International Conference on Modeling and Simulation of Microsystems*, 2002, pp. 162–165.
- [38] E. Vittoz, M. Degrauwe, and S. Bitz, "High-performance crystal oscillator circuits: theory and application," *Solid-State Circuits, IEEE Journal of*, vol. 23, no. 3, pp. 774–783, June 1988.
- [39] H. Barrow, T. Naing, R. Schneider, T. Rocheleau, V. Yeh, Z. Ren, and C.-C. Nguyen, "A real-time 32.768-kHz clock oscillator using a 0.0154-mm2 micromechanical resonator frequency-setting element," in *Frequency Control Symposium (FCS)*, 2012 IEEE International, May 2012, pp. 1-6.

- [40] M. Caruso, T. Bratland, C. Smith, and R. Schneider, "A new perspective on magnetic field sensing," Nonvolatile Electronics, 1998.
- [41] E. C. et al., "Circuit techniques for reducing the effects of op-amp imperfections: Autozeroing, correlated double sampling, and chopper stabilization," *Proceedings of the IEEE*, vol. 84, no. 11, pp. 1584–1814, Nov. 1996.
- [42] B. Murmann, "Energy limits in a/d converters," Invited public lecture at CERN, Geneve, 2012.
- [43] J. McCreary and P. Gray, "All-MOS charge redistribution analog-to-digital conversion tecquiques. I," IEEE J. of Solid State Circuits, vol. SC-10, no. 6, pp. 371–379, Dec. 1975.
- [44] A. Chandrakasana and B. Ginsburg, "An energy-efficient charge recycling approach for a SAR converter with capacitive DAC," Proc. Int. Symp. on Circ. and Syst. (ISCAS), pp. 184–187, May 2005.
- [45] M. Saberi, R. Lotfi, K. Mafinezhad, and W. Serdjin, "Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs," *IEEE Trans. Circuits Syst. I: Reg. Paper*, vol. 58, no. 7, pp. 1736–1747, Aug. 2011.
- [46] B. Razavi, "Design of analog cmos integrated circuits," *McGraw Hill*, 2000.
- [47] C. C. Liu, S. J. Chang, G. Y. Huang, and Y. Z. Lin, "A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure," *IEEE J. of Solid State Circuits*, vol. 45, no. 4, pp. 731–740, Apr. 2010.
- [48] M. van Elzakker et al., "10-bit charge-redistribution ADC consuming 1.9μW at 1 MS/s," IEEE J. of Solid State Circuits, vol. 45, no. 5, pp. 1007–1015, May 2010.
- [49] P. Harpe, E. Cantatore, G. Haller, and B. Murmann, "A 2.2/2.7fJ/conversion-step 10/12b 40kS/s SAR ADC with data-driven noise reduction," *Dig. Tech. Papers Int. Solid State Circuits Conf.*, pp. 270–271, Feb. 2013.
- [50] P. Figueiredo and J. Vital, "Kickback noise reduction techniques for cmos latched comparators," *IEEE Trans. on Circuits and System*, II, vol. 53, no. 7, pp. 541–545, Jul. 2006.
- [51] L. Hochberg, M. Serruya, G. Friehs, J. Mukand, M. Saleh, A. Caplan, A. Branner, D. Chen, R. Penn, and J. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia," *Nature*, no. 442, pp. 164–171, 2006.
- [52] J. N. Y. Aziz, K. Abdelhalim, R. Shulyzki, R. Genov, B. L. Bardakjian, M. Derchansky, D. Serletis, and P. L. Carlen, "256-channel neural recording and delta compression microsystem with 3D electrodes," *IEEE J. Solid-State Circuit*, no. 3, pp. 995–1005, 2009.
- [53] R. Harrison, "A low-power integrated circuit for adaptive detection of action potentials in noisy signals." Proc. 2003 Intl. Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3325-3328, 2005.
- [54] M. Chae, W. Liu, Z. Yang, T. Chen, J. Kim, M. Sivaprakasam, and M. Yuce, "A 128-channel 6mW wireless neural recording IC with on-the-fly spike sorting and UWB transmitter," *IEEE Int. Solid-State Circ. Conf.*, pp. 146–148, 2008.

- [55] J. Rabaey et al., "A fully-integrated, miniaturized (0.125 mm<sup>2</sup>) 10.5 μw wireless neural recording sensor," Journal of Solid State Circuits, 2013.
- [56] S. Brenna et al., "Analysis and optimization of a SAR ADC with attenuation capacitor," Proceedings of the 37th Int. MIPRO Conf., pp. 68–73, May 2014.
- [57] P. Harpe et al., "A 26 μW 8 bit 10 MS/s asynchronous SAR ADC for low energy radios," IEEE J. of Solid State Circuits, vol. 46, no. 7, pp. 1585–1595, Jul. 2012.
- [58] J. A. Fredenburg and M. P. Flynn, "Statistical analysis of ENOB and yield in binary weighted ADCs and DACs with random element mismatch," *IEEE Trans. Circuits Syst. I: Reg. Paper*, vol. 59, no. 7, pp. 1396–1408, Jul. 2012.
- [59] A. Chandrakasan and B. Ginsburg, "500-MS/s 5-bit ADC in 65-nm CMOS with split capacitor array DAC," IEEE J. Solid State Circuits, vol. 42, no. 4, pp. 739-747, Apr. 2007.
- [60] D. Zhang, A. Bhide, and A. Alvandpour, "A 53-nW 9.1-ENOB 1-kS/s SAR ADC in 0.13μm CMOS for medical implant devices," *IEEE J. Solid State Circuits*, vol. 47, no. 7, pp. 1585–1593, Jul. 2012.
- [61] V. Tripathi and B. Murmann, "Mismatch characterization of small metal fringe capacitors," Proc. Cust. Int. Circ. Conf. (CICC), pp. 1–4, Sept. 2013.
- [62] A. Abusleme, A. Dragone, G. Haller, and B. Murmann, "Mismatch of lateral field metaloxide-metal capacitors in 180nm CMOS process," *IEEE Electronic Letters*, vol. 48, pp. 286–287, Mar. 2012.
- [63] H. Hong and G. Lee, "A 65-fJ/conversion-step 0.9-V 200-kS/s rail-to-rail 8-bit successive approximation ADC," *IEEE J. Solid State Circuits*, vol. 42, no. 10, pp. 2161–2168, Oct. 2007.
- [64] A. Agnes, E. Bonizzoni, P. Malcovati, and F. Maloberti, "An ultra-low power successive approximation A/D converter with time-domain comparator," Analog Integrated Circuits and Signal Processing, vol. 64, pp. 183–190, 2010.
- [65] G. Huang and P. Lin, "An 8.38 fJ/conversion-step 0.6 V 8-b 4.35 MS/s aysnchronous SAR ADC in 65 nm CMOS," Analog Integrated Circuits and Signal Processing, vol. 73, pp. 265–272, 2012.
- [66] H. Wu, B. Li, W. Huang, Z. Li, M. Zou, and Y. Wang, "A 1.2V 8-bit 1 MS/s SAR ADC with res-cap segment dac for temperature sensor in LTE," Analog Integrated Circuits and Signal Processing, vol. 73, pp. 225–232, 2012.
- [67] F. Padovan et al., "A 20 mb/s, 2.76pj/b uwb impulse radio tx with 11.7% efficiency in 130nm cmos," Proceedings of the ESSCIRC, 2014.
- [68]
- [69] H. Gao et al., "HermesE: A 96-channel full data rate direct neural interface in 0.13-μm CMOS," IEEE J. of Solid-State Circuits, vol. 47, no. 4, pp. 1043–1056, April 2012.

- [70] S. Brenna, A. Bonetti, A. Bonfanti, and A. L. Lacaita, "A simulation and modeling environment for the analysis and design of charge redistribution DACs used in SAR ADCs," *Proceedings of the 37th Int. MIPRO Conf.*, pp. 74–79, May 2014.
- [71] A. Agnes, E. Bonizzoni, and F. Maloberti, "Design of an ultra-low power SA-ADC with medium/high resolution and speed," Proc. Int. Symp. on Circ. and Syst. (ISCAS), pp. 1–4, 2008.
- [72] S. Brenna, A. Bonetti, A. Bonfanti, and A. L. Lacaita, "A tool for the assisted design of charge redistribution SAR ADCs," Accepted at the Design, Automation and Test in Europe Int. Conf., March 2015.
- [73] A. Agnes, E. Bonizzoni, and F. Maloberti, "A 9.4-ENOB 1V 3.8μW 100kS/s SAR ADC with time-domain comparator," *Dig. Tech. Papers Int. Solid State Circuits Conf.*, vol. 37, no. 2, pp. 246–610, Feb. 2008.
- [74] S. Brenna, A. Bonfanti, and A. L. Lacaita, "A 6-fJ/conversion-step 200kSps asynchronous SAR ADC with attenuation capacitor in 130-nm CMOS," Analog Integrated Circuits and Signal Processing, vol. 81, pp. 181–194, Aug 2014.