Acoustic model training based on node-wise weight boundary model for fast and small-footprint deep neural networks

Abstract Our goal for this study is to enable the development of discrete deep neural networks (NNs), some parameters of which are discretized, as small-footprint and fast NNs for acoustic models. Three essential requirements should be met for achieving this goal; 1) the reduction in discretization errors, 2) implementation for fast processing and 3) node-size reduction of DNNs. We propose a weight-parameter model and its training algorithm for 1), an implementation scheme using a look-up table on general-purpose CPUs for 2), and a layer-biased node-pruning method for 3). The first proposed method can set proper boundaries of discretization at each NN node, resulting in reduction in discretization errors. The second method can reduce the memory usage of NNs within the cache size of the CPU by encoding the parameters of NNs. The last method can reduce the network size of the quantized DNNs by measuring the activity of each node at each layer and pruning them with a layer-dependent score. Experiments with 2-bit discrete NNs showed that our training algorithm maintained almost the same word accuracy as with 8-bit discrete NNs. We achieved a 95% reduction of memory usage and a 74% increase in speed of an NN’s forward calculation.

[1]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[4]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[6]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[11]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[12]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[13]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15]  Yifan Gong,et al.  Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.

[16]  Pierre Blazevic,et al.  Mechatronic design of NAO humanoid , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[18]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[19]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[20]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[23]  Dong Yu,et al.  Exploiting sparseness in deep neural networks for large vocabulary speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Naoyuki Kanda,et al.  Boundary contraction training for acoustic models based on discrete deep neural networks , 2014, INTERSPEECH.

[25]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[26]  Yongqiang Wang,et al.  Small-footprint high-performance deep neural network-based speech recognition using split-VQ , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Kazuhiro Nakadai,et al.  Acoustic model training based on node-wise weight boundary model increasing speed of discrete neural networks , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[28]  Kai Yu,et al.  Reshaping deep neural network for fast decoding by node-pruning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Hon Keung Kwan,et al.  Multilayer feedforward neural networks with single powers-of-two weights , 1993, IEEE Trans. Signal Process..

[30]  Song Han,et al.  A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding , 2015 .

[31]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[32]  Wonyong Sung,et al.  X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).