A Machine Learning Accelerator In-Memory for Energy Harvesting

There is increasing demand to bring machine learning capabilities to low power devices. By integrating the computational power of machine learning with the deployment capabilities of low power devices, a number of new applications become possible. In some applications, such devices will not even have a battery, and must rely solely on energy harvesting techniques. This puts extreme constraints on the hardware, which must be energy efficient and capable of tolerating interruptions due to power outages. Here, as a representative example, we propose an in-memory support vector machine learning accelerator utilizing non-volatile spintronic memory. The combination of processing-in-memory and non-volatility provides a key advantage in that progress is effectively saved after every operation. This enables instant shut down and restart capabilities with minimal overhead. Additionally, the operations are highly energy efficient leading to low power consumption.

[1]  Manos M. Tentzeris,et al.  Ambient RF Energy-Harvesting Technologies for Self-Sustainable Standalone Wireless Sensor Platforms , 2014, Proceedings of the IEEE.

[2]  Anna Delin,et al.  Tunable damping, saturation magnetization, and exchange stiffness of half-Heusler NiMnSb thin films , 2015, 1510.01894.

[3]  Milos Manic,et al.  Intelligent Buildings of the Future: Cyberaware, Deep Learning Powered, and Human Interacting , 2016, IEEE Industrial Electronics Magazine.

[4]  Brandon Lucia,et al.  Termination checking and task decomposition for task-based intermittent programs , 2018, CC.

[5]  Brandon Lucia,et al.  Alpaca: intermittent execution without checkpoints , 2017, Proc. ACM Program. Lang..

[6]  Zhengyang Zhao,et al.  Magnetic Tunnel Junction Based Integrated Logics and Computational Circuits , 2016 .

[7]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Mario Badr,et al.  The EH Model: Early Design Space Exploration of Intermittent Processor Architectures , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Kevin Fu,et al.  Mementos: system support for long-running computation on RFID-scale devices , 2011, ASPLOS XVI.

[10]  Alex S. Weddell,et al.  Using Sleep States to Maximize the Active Time of Transient Computing Systems , 2017, ENSsys@SenSys.

[11]  Ronald M. Summers,et al.  Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique , 2016 .

[12]  Brandon Lucia,et al.  A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices , 2018, ASPLOS.

[13]  Luca Benini,et al.  XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Brandon Lucia,et al.  A simpler, safer programming and execution model for intermittent systems , 2015, PLDI.

[15]  A.P. Chandrakasan,et al.  Next generation micro-power systems , 2008, 2008 IEEE Symposium on VLSI Circuits.

[16]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[17]  Alanson P. Sample,et al.  Design of an RFID-Based Battery-Free Programmable Sensing Platform , 2008, IEEE Transactions on Instrumentation and Measurement.

[18]  Meng-Fan Chang,et al.  A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, VLSIT 2017.

[19]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[20]  Jacob Sorber,et al.  Tragedy of the Coulombs: Federating Energy Storage for Tiny, Intermittently-Powered Sensors , 2015, SenSys.

[21]  Luca Benini,et al.  Hibernus++: A Self-Calibrating and Adaptive System for Transiently-Powered Embedded Devices , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Kevin Marquet,et al.  Peripheral state persistence for transiently-powered systems , 2017, 2017 Global Internet of Things Summit (GIoTS).

[23]  Mahmut T. Kandemir,et al.  Incidental Computing on IoT Nonvolatile Processors , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[25]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[26]  Matthew Hicks,et al.  Intermittent Computation without Hardware Support or Programmer Intervention , 2016, OSDI.

[27]  T. Miyazaki,et al.  Low damping constant for Co2FeAl Heusler alloy films and its correlation with density of states , 2009 .

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Brandon Lucia,et al.  Transactional concurrency control for intermittent, energy-harvesting computing systems , 2019, PLDI.

[30]  Brandon Lucia,et al.  Chain: tasks and channels for reliable intermittent programs , 2016, OOPSLA.

[31]  Z. Diao,et al.  Spin transfer switching in dual MgO magnetic tunnel junctions , 2007 .

[32]  Luca Benini,et al.  Graceful Performance Modulation for Power-Neutral Transient Computing Systems , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[34]  Brandon Lucia,et al.  Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing , 2018, OSDI.

[35]  Jacob Sorber,et al.  Timely Execution on Intermittently Powered Batteryless Sensors , 2017, SenSys.

[36]  Kevin Marquet,et al.  Incremental checkpointing of program state to NVRAM for transiently-powered systems , 2014, 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[37]  Yu Wang,et al.  Switched by input: Power efficient structure for RRAM-based convolutional neural network , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[38]  Narayanan Vijaykrishnan,et al.  Architecture exploration for ambient energy harvesting nonvolatile processors , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[39]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[40]  Shoji Ikeda,et al.  Properties of magnetic tunnel junctions with a MgO/CoFeB/Ta/CoFeB/MgO recording structure down to junction diameter of 11 nm , 2014 .

[41]  Sachin S. Sapatnekar,et al.  Using Spin-Hall MTJs to Build an Energy-Efficient In-memory Computation Platform , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[42]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[43]  Graham Gobieski,et al.  Intermittent Deep Neural Network Inference , 2018 .

[44]  Brandon Lucia,et al.  Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems , 2018, ASPLOS.

[45]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[46]  Ryan J. Halter,et al.  Amulet: An Energy-Efficient, Multi-Application Wearable Platform , 2016, SenSys.

[47]  Meng-Fan Chang,et al.  Ambient energy harvesting nonvolatile processors: From circuit to system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[48]  Xiaochen Peng,et al.  Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[49]  Daisuke Saida,et al.  Sub-3 ns pulse with sub-100 µA switching of 1x–2x nm perpendicular MTJ for high-performance embedded STT-MRAM towards sub-20 nm CMOS , 2016, 2016 IEEE Symposium on VLSI Technology.

[50]  Sachin S. Sapatnekar,et al.  Efficient In-Memory Processing Using Spintronics , 2018, IEEE Computer Architecture Letters.

[51]  Zhiwei Li,et al.  Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[54]  Luca Benini,et al.  Hibernus: Sustaining Computation During Intermittent Supply for Energy-Harvesting Systems , 2015, IEEE Embedded Systems Letters.

[55]  Arnab Raha,et al.  QUICKRECALL: A Low Overhead HW/SW Approach for Enabling Computations across Power Cycles in Transiently Powered Computers , 2014, 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems.

[56]  Changhee Jung,et al.  Lightweight hardware support for transparent consistency-aware checkpointing in intermittent energy-harvesting systems , 2016, 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[57]  Natalie D. Enright Jerger,et al.  The What's Next Intermittent Computing Architecture , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[58]  Jacob Sorber,et al.  Flicker: Rapid Prototyping for the Batteryless Internet-of-Things , 2017, SenSys.

[59]  Matthew Hicks,et al.  Clank: Architectural support for intermittent computation , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[60]  J. Nowak,et al.  STT-MRAM with double magnetic tunnel junctions , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).