XMA2: A crossbar-aware multi-task adaption framework via 2-tier masks

Recently, ReRAM crossbar-based deep neural network (DNN) accelerator has been widely investigated. However, most prior works focus on single-task inference due to the high energy consumption of weight reprogramming and ReRAM cells’ low endurance issue. Adapting the ReRAM crossbar-based DNN accelerator for multiple tasks has not been fully explored. In this study, we propose XMA 2, a novel crossbar-aware learning method with a 2-tier masking technique to efficiently adapt a DNN backbone model deployed in the ReRAM crossbar for new task learning. During the XMA2-based multi-task adaption (MTA), the tier-1 ReRAM crossbar-based processing-element- (PE-) wise mask is first learned to identify the most critical PEs to be reprogrammed for essential new features of the new task. Subsequently, the tier-2 crossbar column-wise mask is applied within the rest of the weight-frozen PEs to learn a hardware-friendly and column-wise scaling factor for new task learning without modifying the weight values. With such crossbar-aware design innovations, we could implement the required masking operation in an existing crossbar-based convolution engine with minimal hardware/memory overhead to adapt to a new task. The extensive experimental results show that compared with other state-of-the-art multiple-task adaption methods, XMA2 achieves the highest accuracy on all popular multi-task learning datasets.

[1]  Jae-sun Seo,et al.  XMA: a crossbar-aware multi-task adaption framework via shift-based mask learning method , 2022, DAC.

[2]  Fei Wang,et al.  Mixed‐Precision Continual Learning Based on Computational Resistance Random Access Memory , 2022, Adv. Intell. Syst..

[3]  Jae-sun Seo,et al.  XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning , 2022, 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Jae-sun Seo,et al.  XBM: A Crossbar Column-wise Binary Mask Learning Method for Efficient Multiple Task Adaption , 2022, 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC).

[5]  Xiaochen Peng,et al.  Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks , 2021, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  Fan Zhang,et al.  CCCS: Customized SPICE-level Crossbar-array Circuit Simulator for In-Memory Computing , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[7]  Jinwoo Shin,et al.  Layer-adaptive Sparsity for the Magnitude-based Pruning , 2020, ICLR.

[8]  Li Yang,et al.  KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Eunhyeok Park,et al.  PROFIT: A Novel Training Method for sub-4-bit MobileNet Models , 2020, ECCV.

[10]  Yangyin Chen,et al.  ReRAM: History, Status, and Future , 2020, IEEE Transactions on Electron Devices.

[11]  Xiaochen Peng,et al.  DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies , 2019, 2019 IEEE International Electron Devices Meeting (IEDM).

[12]  Shimeng Yu,et al.  High-Throughput In-Memory Computing for Binary Deep Neural Networks With Monolithically Integrated RRAM and 90-nm CMOS , 2019, IEEE Transactions on Electron Devices.

[13]  Zhengya Zhang,et al.  A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations , 2019, Nature Electronics.

[14]  Huazhong Yang,et al.  TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[16]  Dejan S. Milojicic,et al.  PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference , 2019, ASPLOS.

[17]  Hai Li,et al.  EMAT: An Efficient Multi-Task Architecture for Transfer Learning using ReRAM , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[19]  Shimeng Yu,et al.  A Methodology to Improve Linearity of Analog RRAM for Neuromorphic Computing , 2018, 2018 IEEE Symposium on VLSI Technology.

[20]  Barbara Caputo,et al.  Adding New Tasks to a Single Network with Weight Trasformations using Binary Masks , 2018, ECCV Workshops.

[21]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[23]  Sparsh Mittal,et al.  A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks , 2018, Mach. Learn. Knowl. Extr..

[24]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[26]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[27]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[28]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[29]  Xiaochen Peng,et al.  NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Shaahin Angizi,et al.  Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[31]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[32]  John K. Tsotsos,et al.  Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[35]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[36]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[37]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[38]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[39]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[42]  Babak Saleh,et al.  Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature , 2015, ArXiv.

[43]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[46]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[47]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[48]  Hisashi Shima,et al.  Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[49]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[50]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[51]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.