ENOS: Energy-Aware Network Operator Search for Hybrid Digital and Compute-in-Memory DNN Accelerators

This work proposes a novel Energy-Aware Network Operator Search (ENOS) approach to address the energy-accuracy trade-offs of a deep neural network (DNN) accelerator. In recent years, novel inference operators such as binary weight, multiplication-free, and deep shift have been proposed to improve the computational efficiency of a DNN. Augmenting the operators, their corresponding novel computing modes such as compute-in-memory and XOR networks have also been explored. However, simplification of DNN operators invariably comes at the cost of lower accuracy, especially on complex processing tasks. While the prior works explore an end-to-end DNN processing with the same operator and computing mode, our proposed ENOS framework allows an optimal layerwise integration of inference operators and computing modes to achieve the desired balance of energy and accuracy. The search in ENOS is formulated as a continuous optimization problem, solvable using typical gradient descent methods, thereby scalable to larger DNNs with minimal increase in training cost. We characterize ENOS under two settings. In the first setting, for digital accelerators, we discuss ENOS on multiplyaccumulate (MAC) cores that can be reconfigured to different operators. ENOS training methods with single and bi-level optimization objectives are discussed and compared. We also discuss a sequential operator assignment strategy in ENOS that only learns the assignment for one layer in one training step, enabling greater flexibility in converging towards the optimal operator allocations. Furthermore, following Bayesian principles, a sampling-based variational mode of ENOS is also presented. ENOS is characterized on popular DNNs ShuffleNet and SqueezeNet on CIFAR10 and CIFAR100. Compared to the conventional uni-operator approaches, under the same energy budget, ENOS improves accuracy by 10–20%. In the second setting, for a hybrid digital and compute-in-memory accelerator, we characterize ENOS to assign both layer-wise computing mode (high precision digital or low precision analog compute-in-memory) as well as operator while staying within the total compute-in-memory budget. Under varying configurations of hybrid accelerator, ENOS can leverage higher energy efficiency of compute-in-memory operations to reduce the operating energy of DNNs by 5× while suffering <1% reduction in accuracy. Characterization results using ENOS show interesting insights, such as amenability of different filters to using low complexity operators, minimizing the energy of inference while maintaining high prediction accuracy.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Christoph Meinel,et al.  BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet , 2017, ACM Multimedia.

[3]  Nick Iliev,et al.  Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks , 2020, ArXiv.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[6]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[7]  Max Welling,et al.  Probabilistic Binary Neural Networks , 2018, ArXiv.

[8]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  SS Teja Nibhanupudi,et al.  COMPAC: Compressed Time-Domain, Pooling-Aware Convolution CNN Engine With Reduced Data Movement for Energy-Efficient AI Computing , 2020, IEEE Journal of Solid-State Circuits.

[10]  Zihao Chen,et al.  DeepShift: Towards Multiplication-Less Neural Networks , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Wilfred Gomes,et al.  MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Shamma Nasrin,et al.  Supported-BinaryNet: Bitcell Array-based Weight Supports for Dynamic Accuracy-Latency Trade-offs in SRAM-based Binarized Neural Network , 2019, ArXiv.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Aysegul Uner,et al.  Multiplication-free Neural Networks , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[15]  Hossein Valavi,et al.  A Mixed-Signal Binarized Convolutional-Neural-Network Accelerator Integrating Dense Weight Storage and Multiplication for Reduced Data Movement , 2018, 2018 IEEE Symposium on VLSI Circuits.

[16]  Dah-Jye Lee,et al.  A Review of Binarized Neural Networks , 2019, Electronics.

[17]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[18]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[20]  Amit Ranjan Trivedi,et al.  Compute-in-Memory Upside Down: A Learning Operator Co-Design Perspective for Scalability , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Shih-Chieh Chang,et al.  15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[22]  Yu Cao,et al.  Exploring sub-20nm FinFET design with Predictive Technology Models , 2012, DAC Design Automation Conference 2012.

[23]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[24]  Hongyang Jia,et al.  In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[25]  Amit Ranjan Trivedi,et al.  Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices , 2020, IEEE Access.

[26]  Theja Tulabandhula,et al.  MC2RAM: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[27]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[29]  Meng-Fan Chang,et al.  14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[30]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.