论文信息 - An Energy-efficient Processing-in-memory Architecture for Long Short Term Memory in Spin Orbit Torque MRAM

An Energy-efficient Processing-in-memory Architecture for Long Short Term Memory in Spin Orbit Torque MRAM

Many recent studies have focused on Processing-in-memory (PIM) architectures for neural networks to resolve the memory bottleneck problem. Especially, an increased interest in Spin Orbit Torque (SOT)-MRAMs has emerged due to its low latency, high energy efficiency, and non-volatility. However, the previous work added extra computing circuits to support complicated computations, which results in large energy overheads. In this work, we propose a new PIM architecture with relatively small peripheral circuit, which produces the highest energy efficiency for processing a Long Short Term Memory (LSTM) among the PIM architectures. We improve the efficiency with a new computing method for logical operations, which exploits characteristics of SOT-MRAMs. We reduce the number of word lines (WLs) activated concurrently to one from two in the previous works. As a result, the energy for driving WLs is saved, and the sensing current for computation is reduced. Moreover, we propose efficient methods for additions, multiplications and non-linear activation functions in memory to process an LSTM. Accordingly, we achieve 1.26x energy efficiency with the proposed computing method for logical operations compared to the previous study based on SOT-MRAMs and up to 5.54x energy efficiency over the previous PIM architectures based on other memories.

[1] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2] Saibal Mukhopadhyay,et al. ReRAM-Based Processing-in-Memory Architecture for Recurrent Neural Network Acceleration , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3] William J. Dally,et al. Scaling the Power Wall: A Path to Exascale , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4] Shaahin Angizi,et al. DIMA: A Depthwise CNN In-Memory Accelerator , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5] Zhongfeng Wang,et al. Accelerating Recurrent Neural Networks: A Memory-Efficient Approach , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6] Kaushik Roy,et al. Layout-aware optimization of stt mrams , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9] Geoffrey Zweig,et al. Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[10] Mehdi Baradaran Tahoori,et al. Evaluation of Hybrid Memory Technologies Using SOT-MRAM for On-Chip Cache Hierarchy , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12] David Blaauw,et al. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[13] D. Ralph,et al. Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum , 2012, Science.

[14] D. Ralph,et al. Spin transfer torque devices utilizing the giant spin Hall effect of tungsten , 2012, 1208.1711.

[15] Hoi-Jun Yoo,et al. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[16] Yuan Xie,et al. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17] Shaahin Angizi,et al. Exploring a SOT-MRAM Based In-Memory Computing for Data Processing , 2018, IEEE Transactions on Multi-Scale Computing Systems.