PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks

Convolution neural networks (CNNs) have demonstrated superior capability in computer vision, speech recognition, autonomous driving, and so forth, which are opening up an artificial intelligence (AI) era. However, conventional CNNs require significant matrix computation and memory usage leading to power and memory issues for mobile deployment and embedded chips. On the algorithm side, the emerging binary neural networks (BNNs) promise portable intelligence by replacing the costly massive floating-point compute-and-accumulate operations with lightweight bit-wise XNOR and popcount operations. On the hardware side, the computing-in-memory (CIM) architectures developed by the non-volatile memory (NVM) present outstanding performance regarding high speed and good power efficiency. In this paper, we propose an NVM-based CIM architecture employing a Preset-XNOR operation in/with the spin–orbit torque magnetic random access memory (SOT-MRAM) to accelerate the computation of BNNs (PXNOR-BNN). PXNOR-BNN performs the XNOR operation of BNNs inside the computing-buffer array with only slight modifications of the peripheral circuits. Based on the layer evaluation results, PXNOR-BNN can achieve similar performance compared with the read-based SOT-MRAM counterpart. Finally, the end-to-end estimation demonstrates $12.3\times $ speedup compared with the baseline with 96.6-image/s/W throughput efficiency.

[1]  Shaahin Angizi,et al.  PIMA-Logic: A Novel Processing-in-Memory Architecture for Highly Flexible and Energy-Efficient Logic Computation , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[2]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[3]  Kang L. Wang,et al.  Switching of perpendicular magnetization by spin-orbit torques in the absence of external magnetic fields. , 2013, Nature nanotechnology.

[4]  Fabrizio Lombardi,et al.  Design and Comparative Evaluation of a PCM-Based CAM (Content Addressable Memory) Cell , 2017, IEEE Transactions on Nanotechnology.

[5]  Akihito Yamamoto,et al.  23.5 A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[6]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Zhaohao Wang,et al.  DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Zhaohao Wang,et al.  Perpendicular-anisotropy magnetic tunnel junction switched by spin-Hall-assisted spin-transfer torque , 2015, Journal of Physics D: Applied Physics.

[9]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[10]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Zhaohao Wang,et al.  High-Density NAND-Like Spin Transfer Torque Memory With Spin Orbit Torque Erase Operation , 2018, IEEE Electron Device Letters.

[12]  Liang Chang,et al.  PRESCOTT: Preset-based cross-point architecture for spin-orbit-torque magnetic random access memory , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[13]  Christopher Torng,et al.  The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips , 2018, IEEE Micro.

[14]  A. Fert,et al.  Field-free switching of a perpendicular magnetic tunnel junction through the interplay of spin–orbit and spin-transfer torques , 2018, Nature Electronics.

[15]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  William Legrand,et al.  Coherent Subnanosecond Switching of Perpendicular Magnetization by the Fieldlike Spin-Orbit Torque without an External Magnetic Field , 2015 .

[17]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[18]  H. Ohno,et al.  Layer thickness dependence of the current-induced effective field vector in Ta|CoFeB|MgO. , 2012, Nature materials.

[19]  Kaushik Roy,et al.  Comprehensive Scaling Analysis of Current Induced Switching in Magnetic Memories Based on In-Plane and Perpendicular Anisotropies , 2016, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[20]  Mehdi Baradaran Tahoori,et al.  Evaluation of Hybrid Memory Technologies Using SOT-MRAM for On-Chip Cache Hierarchy , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[22]  Kevin Garello,et al.  Ultra-Fast Perpendicular Spin–Orbit Torque MRAM , 2015, IEEE Transactions on Magnetics.

[23]  Yiran Chen,et al.  Exploring the opportunity of implementing neuromorphic computing systems with spintronic devices , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[25]  Yiran Chen,et al.  ReCom: An efficient resistive accelerator for compressed deep neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[26]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[27]  Weisheng Zhao,et al.  Proposal of Toggle Spin Torques Magnetic RAM for Ultrafast Computing , 2019, IEEE Electron Device Letters.

[28]  A. Fert,et al.  Current-induced magnetization switching in atom-thick tungsten engineered perpendicular magnetic tunnel junctions with large tunnel magnetoresistance , 2017, Nature Communications.

[29]  Shaahin Angizi,et al.  GraphS: A Graph Processing Accelerator Leveraging SOT-MRAM , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Babak Nadjar Araabi,et al.  Neural network stream processing core (NnSP) for embedded systems , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[31]  Xiaochen Peng,et al.  XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[32]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[33]  Jong-Ryul Jeong,et al.  Field-free switching of perpendicular magnetization through spin-orbit torque in antiferromagnet/ferromagnet/oxide structures. , 2016, Nature nanotechnology.

[34]  Weisheng Zhao,et al.  High Speed, High Stability and Low Power Sensing Amplifier for MTJ/CMOS Hybrid Logic Circuits , 2009, IEEE Transactions on Magnetics.

[35]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[36]  Kang L. Wang,et al.  Effect of the oxide layer on current-induced spin-orbit torques in Hf|CoFeB|MgO and Hf|CoFeB|TaOx structures , 2015 .

[37]  Engin Ipek,et al.  Compact Model for Spin–Orbit Magnetic Tunnel Junctions , 2016, IEEE Transactions on Electron Devices.

[38]  Tsuchida Kenji,et al.  A 4Gb LPDDR2 STT-MRAM with Compact 9F2 1T1MTJ Cell and Hierarchical Bitline Architecture , 2017 .

[39]  H. Wong,et al.  Carbon Nanotube And Graphene Device Physics , 2010 .

[40]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[41]  Liang Chang,et al.  CORN: In-Buffer Computing for Binary Neural Network , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[42]  Bernard Dieny,et al.  Compact Modeling of a Magnetic Tunnel Junction Based on Spin Orbit Torque , 2014, IEEE Transactions on Magnetics.

[43]  H. Ohno,et al.  Magnetization switching by spin-orbit torque in an antiferromagnet-ferromagnet bilayer system. , 2015, Nature materials.

[44]  Evangelos Eleftheriou,et al.  Mixed-precision training of deep neural networks using computational memory , 2017, ArXiv.

[45]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[46]  Sen Jia,et al.  Convolutional neural networks for hyperspectral image classification , 2017, Neurocomputing.

[47]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.