Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

This paper presents Mr.Wolf, a parallel ultra-low power (PULP) system on chip (SoC) featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an eight-core floating-point capable of processing engine available on demand. The proposed SoC, implemented in a 40-nm LP CMOS technology, features a 108- $\mu \text{W}$ fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The eight-core compute cluster achieves a peak performance of 850 million of 32-bit integer multiply and accumulate per second (MMAC/s) and 500 million of 32-bit floating-point multiply and accumulate per second (MFMAC/s) −1 GFlop/s—with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end nodes improving performance by several orders of magnitude with respect to traditional single-core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr.Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on real-life applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things.

[1]  David Bol,et al.  SleepWalker: A 25-MHz 0.4-V Sub- $\hbox{mm}^{2}$ 7- $\mu\hbox{W/MHz}$ Microcontroller in 65-nm LP/GP CMOS for Low-Carbon Wireless Sensor Nodes , 2013, IEEE Journal of Solid-State Circuits.

[2]  U. Rajendra Acharya,et al.  EEG Signal Analysis: A Survey , 2010, Journal of Medical Systems.

[3]  Luca Benini,et al.  Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Comparing Fixed-and Floating-Point DSPs , 2004 .

[5]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[6]  Steven W. Smith CHAPTER 28 – Digital Signal Processors , 2002 .

[7]  Andreas Peter Burg,et al.  DynOR: A 32-bit microprocessor in 28 nm FD-SOI with cycle-by-cycle dynamic clock adjustment , 2016, ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference.

[8]  Yunsup Lee,et al.  The RISC-V Instruction Set Manual , 2014 .

[9]  Weidong Zhou,et al.  Automatic Seizure Detection Using Wavelet Transform and SVM in Long-Term Intracranial EEG , 2012, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[10]  Ulrich Rückert,et al.  CoreVA: A Configurable Resource-Efficient VLIW Processor Architecture , 2014, 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing.

[11]  Jean-Luc Nagel,et al.  Sub-threshold latch-based icyflex2 32-bit processor with wide supply range operation , 2016, 2016 46th European Solid-State Device Research Conference (ESSDERC).

[12]  Luca Benini,et al.  Mr. Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IOT Edge Processing , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[13]  Mingoo Seok,et al.  Near-Vt adaptive microprocessor and power-management-unit system based on direct error regulation , 2017, ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference.

[14]  Willie Anderson,et al.  10.1 A 28nm DSP powered by an on-chip LDO for high-performance and energy-efficient mobile applications , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[15]  Olivier Billoint,et al.  A 460 MHz at 397 mV, 2.6 GHz at 1.3 V, 32 bits VLIW DSP Embedding F MAX Tracking , 2015, IEEE Journal of Solid-State Circuits.

[16]  Jan M. Rabaey,et al.  A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing , 2016, ISLPED.

[17]  Nathan Ickes,et al.  A 10 pJ/cycle ultra-low-voltage 32-bit microprocessor system-on-chip , 2011, 2011 Proceedings of the ESSCIRC (ESSCIRC).

[18]  David Blaauw,et al.  8.2 Batteryless Sub-nW Cortex-M0+ processor with dynamic leakage-suppression logic , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[19]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[20]  Vinayak Honkote,et al.  A Sub-cm3 Energy-Harvesting Stacked Wireless Sensor Node Featuring a Near-Threshold Voltage IA-32 Microcontroller in 14-nm Tri-Gate CMOS for Always-ON Always-Sensing Applications , 2017, IEEE Journal of Solid-State Circuits.

[21]  Davide Rossi,et al.  Flexible, Scalable and Energy Efficient Bio-Signals Processing on the PULP Platform: A Case Study on Seizure Detection , 2017 .

[22]  Luca Benini,et al.  GAP-8: A RISC-V SoC for AI at the Edge of the IoT , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[23]  Elad Alon,et al.  A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC–DC Converters in 28 nm FDSOI , 2016, IEEE Journal of Solid-State Circuits.

[24]  Luca Benini,et al.  Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster , 2017, IEEE Micro.

[25]  Massimo Alioto,et al.  A 595pW 14pJ/Cycle microcontroller with dual-mode standard cells and self-startup for battery-indifferent distributed sensing , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[26]  Suhwan Kim,et al.  A cm-scale self-powered intelligent and secure IoT edge mote featuring an ultra-low-power SoC in 14nm tri-gate CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[27]  A. Burg,et al.  Towards generic low-power area-efficient standard cell based memory architectures , 2010, 2010 53rd IEEE International Midwest Symposium on Circuits and Systems.

[28]  Luca Benini,et al.  An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  Luca Benini,et al.  The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[30]  Pranay Prabhat,et al.  A Subthreshold ARM Cortex-M0+ Subsystem in 65 nm CMOS for WSN Applications with 14 Power Domains, 10T SRAM, and Integrated Voltage Regulator , 2016, IEEE Journal of Solid-State Circuits.

[31]  Luca Benini,et al.  Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications , 2017, 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[32]  Wim Dehaene,et al.  A Differential Transmission Gate Design Flow for Minimum Energy Sub-10-pJ/Cycle ARM Cortex-M0 MCUs , 2017, IEEE Journal of Solid-State Circuits.

[33]  David Blaauw,et al.  Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.