MANIC: A Vector-Dataflow Architecture for Ultra-Low-Power Embedded Systems

Ultra-low-power sensor nodes enable many new applications and are becoming increasingly pervasive and important. Energy efficiency is the key determinant of the value of these devices: battery-powered nodes want their battery to last, and nodes that harvest energy should minimize their time spent recharging. Unfortunately, current devices are energy-inefficient. In this work, we present MANIC, a new, highly energy-efficient architecture targeting the ultra-low-power sensor domain. MANIC achieves high energy-efficiency while maintaining programmability and generality. MANIC introduces vector-dataflow execution, allowing it to exploit the dataflows in a sequence of vector instructions and amortize instruction fetch and decode over a whole vector of operations. By forwarding values from producers to consumers, MANIC avoids costly vector register file accesses. By carefully scheduling code and avoiding dead register writes, MANIC avoids costly vector register writes. Across seven benchmarks, MANIC is on average 2.8× more energy efficient than a scalar baseline, 38.1% more energy-efficient than a vector baseline, and gets to within 26.4% of an idealized design.

[1]  Onur Mutlu,et al.  Zorua: A holistic approach to resource virtualization in GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[3]  Brandon Lucia,et al.  A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices , 2018, ASPLOS.

[4]  Christoforos E. Kozyrakis,et al.  Overcoming the limitations of conventional vector processors , 2003, ISCA '03.

[5]  Anthony Rowe,et al.  Sensor Andrew: Large-scale campus-wide sensing and actuation , 2011, IBM J. Res. Dev..

[6]  Alanson P. Sample,et al.  Design of an RFID-Based Battery-Free Programmable Sensing Platform , 2008, IEEE Transactions on Instrumentation and Measurement.

[7]  Peter G. Sassone,et al.  Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8]  Brandon Lucia,et al.  Chain: tasks and channels for reliable intermittent programs , 2016, OOPSLA.

[9]  Somnath Mazumdar,et al.  Enabling Massive Multi-Threading with Fast Hashing , 2018, IEEE Computer Architecture Letters.

[10]  Nam Sung Kim,et al.  GPU register file virtualization , 2015, MICRO.

[11]  Brandon Lucia,et al.  A simpler, safer programming and execution model for intermittent systems , 2015, PLDI.

[12]  Mark Wyse 1 Understanding GPGPU Vector Register File Usage , 2018 .

[13]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[14]  Farinaz Koushanfar,et al.  Idetic: A high-level synthesis approach for enabling long computations on transiently-powered ASICs , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[15]  Kunle Olukotun,et al.  Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[16]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[17]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[18]  Narayanan Vijaykrishnan,et al.  Architecture exploration for ambient energy harvesting nonvolatile processors , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[19]  David Black-Schaffer,et al.  An Energy-Efficient Processor Architecture for Embedded Systems , 2008, IEEE Computer Architecture Letters.

[20]  Jacob Sorber,et al.  Tragedy of the Coulombs: Federating Energy Storage for Tiny, Intermittently-Powered Sensors , 2015, SenSys.

[21]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[22]  Pierluigi Crescenzi,et al.  An environment for self-assessing Java programming skills in first programming courses , 2004, IEEE International Conference on Advanced Learning Technologies, 2004. Proceedings..

[23]  Yuan Xie,et al.  Emerging Memory Technologies: Design, Architecture, and Applications , 2013 .

[24]  Karthikeyan Sankaralingam,et al.  DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[25]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[26]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[27]  Guang R. Gao,et al.  An efficient pipelined dataflow processor architecture , 1988, Proceedings. SUPERCOMPUTING '88.

[28]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[29]  Ming-Jinn Tsai,et al.  Low-Power MCU With Embedded ReRAM Buffers as Sensor Hub for IoT Applications , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[30]  Karthikeyan Sankaralingam,et al.  Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[31]  Natalie D. Enright Jerger,et al.  The EH Model: Analytical Exploration of Energy-Harvesting Architectures , 2018, IEEE Computer Architecture Letters.

[32]  Luca Benini,et al.  Hibernus++: A Self-Calibrating and Adaptive System for Transiently-Powered Embedded Devices , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[34]  M. Horton MICA: The Commercialization of Microsensor Motes , 2002 .

[35]  James E. Smith,et al.  An instruction set and microarchitecture for instruction level distributed processing , 2002, ISCA.

[36]  Dong Li,et al.  DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[37]  John Wawrzynek,et al.  T0: A Single-Chip Vector Microprocessor with Reconfigurable Pipelines , 1996, ESSCIRC '96: Proceedings of the 22nd European Solid-State Circuits Conference.

[38]  Margaret Martonosi,et al.  Implementing software on resource-constrained mobile sensors: experiences with Impala and ZebraNet , 2004, MobiSys '04.

[39]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[40]  David Black-Schaffer,et al.  Long term parking (LTP): Criticality-aware resource allocation in OOO processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  William J. Dally,et al.  Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[42]  Kanad Ghose,et al.  SPARTAN: Speculative avoidance of register allocations to transient values for performance and energy efficiency , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[43]  Brandon Lucia,et al.  Transactional concurrency control for intermittent, energy-harvesting computing systems , 2019, PLDI.

[44]  Michael F. P. O'Boyle,et al.  Compiler directed early register release , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[45]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .

[46]  Benjamin Ransford,et al.  Moo : A Batteryless Computational RFID and Sensing Platform , 2011 .

[47]  Mahmut T. Kandemir,et al.  Incidental Computing on IoT Nonvolatile Processors , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[48]  Gierad Laput,et al.  Synthetic Sensors: Towards General-Purpose Sensing , 2017, CHI.

[49]  Luca Benini,et al.  Hibernus: Sustaining Computation During Intermittent Supply for Energy-Harvesting Systems , 2015, IEEE Embedded Systems Letters.

[50]  Arnab Raha,et al.  QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently Powered Computers , 2014, 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems.

[51]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[52]  David Black-Schaffer,et al.  Efficient Embedded Computing , 2008, Computer.

[53]  Natalie D. Enright Jerger,et al.  The What's Next Intermittent Computing Architecture , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54]  Amir Roth,et al.  Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[55]  Katherine Yelick,et al.  A Case for Intelligent DRAM: IRAM , 1998 .

[56]  Soner Önder,et al.  LaZy Superscalar , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[57]  Brandon Lucia,et al.  Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing , 2018, OSDI.

[58]  Stephen W. Keckler,et al.  Software-Directed Techniques for Improved GPU Register File Utilization , 2018, ACM Trans. Archit. Code Optim..

[59]  Matthew Hicks,et al.  Intermittent Computation without Hardware Support or Programmer Intervention , 2016, OSDI.

[60]  Michael A. Schuette,et al.  The Reconfigurable Streaming Vector Processor (RSVPTM) , 2003, MICRO.

[61]  Brandon Lucia,et al.  Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems , 2018, ASPLOS.

[62]  Anthony Rowe,et al.  OpenChirp: A Low-Power Wide-Area Networking architecture , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[63]  Amir Roth,et al.  RENO: a rename-based instruction optimizer , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[64]  Brandon Lucia,et al.  Supporting peripherals in intermittent systems with just-in-time checkpoints , 2019, PLDI.

[65]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[66]  Brian Kingsbury,et al.  Spert-II: A Vector Microprocessor System , 1996, Computer.

[67]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[68]  Brandon Lucia,et al.  Alpaca: intermittent execution without checkpoints , 2017, Proc. ACM Program. Lang..

[69]  Joshua R. Smith,et al.  Towards Battery-Free HD Video Streaming , 2018, NSDI.

[70]  William J. Dally,et al.  Operand Registers and Explicit Operand Forwarding , 2009, IEEE Computer Architecture Letters.

[71]  Przemyslaw Pawelczak,et al.  InK: Reactive Kernel for Tiny Batteryless Sensors , 2018, SenSys.

[72]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[73]  Jacob Sorber,et al.  Flicker: Rapid Prototyping for the Batteryless Internet-of-Things , 2017, SenSys.

[74]  Matthew Hicks,et al.  Clank: Architectural support for intermittent computation , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[75]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.