A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, none-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra energy efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area \& power) and throughput of SOT-MRAM PIM engine is significantly improved.

[1]  Yanzhi Wang,et al.  An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[2]  Yanzhi Wang,et al.  Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM , 2019, ArXiv.

[3]  Andrew B. Kahng,et al.  CACTI 7 , 2017, ACM Trans. Archit. Code Optim..

[4]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[7]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Hai Li,et al.  Group Scissor: Scaling neuromorphic computing design to large neural networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Shaahin Angizi,et al.  IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Jiayu Li,et al.  Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs , 2018, ACM Great Lakes Symposium on VLSI.

[16]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[19]  Sparsh Mittal,et al.  A survey of spintronic architectures for processing-in-memory and neural networks , 2019, J. Syst. Archit..

[20]  Arman Roohi,et al.  Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Effciency, and Power-Intermittency Resilience , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[21]  Niraj K. Jha,et al.  NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[22]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Qinru Qiu,et al.  Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework , 2018, AAAI.

[24]  Jieping Ye,et al.  AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2019, ArXiv.

[25]  Shaahin Angizi,et al.  Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[26]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.

[27]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Bo Yuan,et al.  Memristor crossbar-based ultra-efficient next-generation baseband processors , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[29]  Yiran Chen,et al.  2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[30]  Jingtong Hu,et al.  An area and energy efficient design of domain-wall memory-based deep convolutional neural networks using stochastic computing , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).

[31]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yanzhi Wang,et al.  A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers , 2018, ECCV.

[34]  Yanzhi Wang,et al.  ResNet Can Be Pruned 60×: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning , 2019, 2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[35]  Wei Niu,et al.  PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.

[36]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..