Deep neural network based acoustic model parameter reduction using manifold regularized low rank matrix factorization

In this paper, we propose a deep neural network (DNN) model parameter reduction based on manifold regularized low rank matrix factorization to reduce the computational complexity of acoustic model for low resource embedded devices. One of the most common DNN model parameter reduction techniques is truncated singular value decomposition (TSVD). TSVD reduces the number of parameters by approximating a target matrix with a low rank one in terms of minimizing the Euclidean norm. In this work, we questioned whether the Euclidean norm is appropriate as objective function to factorize DNN matrices because DNN is known to learn nonlinear manifold of acoustic features. Therefore, in order to exploit the manifold structure for robust parameter reduction, we propose manifold regularized matrix factorization approach. The proposed method was evaluated on TIMIT phone recognition domain.

[1]  Zhenyue Zhang,et al.  Low-Rank Matrix Approximation with Manifold Regularization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Alexander Gruenstein,et al.  Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.

[5]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[6]  Ian McGraw,et al.  Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[9]  Yuzong Liu,et al.  Acoustic modeling with neural graph embeddings , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Richard C. Rose,et al.  Manifold regularized deep neural networks , 2014, INTERSPEECH.

[11]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Yajie Miao,et al.  Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN , 2014, ArXiv.

[13]  Hervé Bourlard,et al.  Exploiting low-dimensional structures to enhance DNN based acoustic modeling in speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Dong Yu,et al.  Exploiting sparseness in deep neural networks for large vocabulary speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yeming Xiao,et al.  Speeding up deep neural network based speech recognition systems , 2014, J. Softw..

[16]  Yongqiang Wang,et al.  Small-footprint high-performance deep neural network-based speech recognition using split-VQ , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Ephrime A. Vidar,et al.  SVD Based Graph Regularized Matrix Factorization , 2013, IDEAL.

[18]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[19]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .