Power, Performance, and Area Benefit of Monolithic 3D ICs for On-Chip Deep Neural Networks Targeting Speech Recognition

In recent years, deep learning has become widespread for various real-world recognition tasks. In addition to recognition accuracy, energy efficiency and speed (i.e., performance) are other grand challenges to enable local intelligence in edge devices. In this article, we investigate the adoption of monolithic three-dimensional (3D) IC (M3D) technology for deep learning hardware design, using speech recognition as a test vehicle. M3D has recently proven to be one of the leading contenders to address the power, performance, and area (PPA) scaling challenges in advanced technology nodes. Our study encompasses the influence of key parameters in DNN hardware implementations towards their performance and energy efficiency, including DNN architectural choices, underlying workloads, and tier partitioning choices in M3D designs. Our post-layout M3D designs, together with hardware-efficient sparse algorithms, produce power savings and performance improvement beyond what can be achieved using conventional 2D ICs. Experimental results show that M3D offers 22.3% iso-performance power saving and 6.2% performance improvement, convincingly demonstrating its entitlement as a solution for DNN ASICs. We further present architectural and physical design guidelines for M3D DNNs to maximize the benefits.

[1]  Sung Kyu Lim,et al.  Power, performance, and cost comparisons of monolithic 3D ICs and TSV-based 3D ICs , 2015, 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S).

[2]  Sung Kyu Lim,et al.  Design and CAD methodologies for low power gate-level monolithic 3D ICs , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[3]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  W. Gardner Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique , 1984 .

[5]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Sung Kyu Lim,et al.  Cascade2D: A design-aware partitioning approach to monolithic 3D IC with 2D commercial tools , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Geoffrey Zweig,et al.  The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[10]  Sung Kyu Lim,et al.  Match-making for Monolithic 3D IC: Finding the right technology node , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Vivienne Sze,et al.  Hardware for machine learning: Challenges and opportunities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[12]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[13]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[14]  Kai Yu,et al.  Reshaping deep neural network for fast decoding by node-pruning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  A. Toffoli,et al.  Advances in 3D CMOS sequential integration , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yu Cao,et al.  Monolithic 3D IC designs for low-power deep neural networks targeting speech recognition , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[19]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[20]  Chaitali Chakrabarti,et al.  Minimizing area and energy of deep learning hardware design using collective low precision and structured compression , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[21]  Diederik P. Kingma,et al.  GPU Kernels for Block-Sparse Weights , 2017 .

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Qinru Qiu,et al.  Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[24]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[26]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[27]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[29]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xihong Wu,et al.  GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Chaitali Chakrabarti,et al.  Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).