Unsupervised adaptation for deep neural network using linear least square method

In this paper, we propose a novel model based adaptation for deep neural networks based on a linear least square method. Our proposed algorithm can perform unsupervised adaptation even if the auto transcripts may have 60-70% of word error rate. We evaluate our algorithm on low resource languages, from the the IARPA BABEL program, such as Assamese, Bengali, Haitian Creole, Lao and Zulu. Our experiments focus on unsupervised speaker, dialect and environment adaptation and we show that it can improve both speech recognition and keyword search performance.

[1]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2]  Richard M. Schwartz,et al.  Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Kaisheng Yao,et al.  Adaptation of context-dependent deep neural networks for automatic speech recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5]  Jan Cernocký,et al.  BUT BABEL system for spontaneous Cantonese , 2013, INTERSPEECH.

[6]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Richard M. Schwartz,et al.  Discriminative semi-supervised training for keyword search in low resource languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[8]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Florian Metze,et al.  Towards speaker adaptive training of deep neural network acoustic models , 2014, INTERSPEECH.

[10]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[11]  Amparo Alonso-Betanzos,et al.  A Global Optimum Approach for One-Layer Neural Networks , 2002, Neural Computation.

[12]  Yongqiang Wang,et al.  Adaptation of deep neural network acoustic models using factorised i-vectors , 2014, INTERSPEECH.

[13]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[14]  Martin Karafiát,et al.  Semi-supervised bootstrapping approach for neural network feature extractor training , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  Spyridon Matsoukas,et al.  Region Dependent Transform on MLP Features for Speech Recognition , 2011, INTERSPEECH.

[16]  Hank Liao,et al.  Speaker adaptation of context dependent deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Kaisheng Yao,et al.  KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[19]  Richard M. Schwartz,et al.  Improving semi-supervised deep neural network for keyword search in low resource languages , 2014, INTERSPEECH.

[20]  Jeff Siu-Kei Au-Yeung,et al.  Improved performance of Aurora 4 using HTK and unsupervised MLLR adaptation , 2004, INTERSPEECH.

[21]  Dong Yu,et al.  Efficient and effective algorithms for training single-hidden-layer neural networks , 2012, Pattern Recognit. Lett..