Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
暂无分享,去创建一个
[1] Shigeru Katagiri,et al. Speaker Adaptive Training using Deep Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[3] Dong Yu,et al. Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Khe Chai Sim,et al. On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Peter Bell,et al. Structured output layer with auxiliary targets for context-dependent acoustic modelling , 2015, INTERSPEECH.
[6] Yifan Gong,et al. Factorized adaptation for deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Kaisheng Yao,et al. Adaptation of context-dependent deep neural networks for automatic speech recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[8] Emmanuel Vincent,et al. DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR , 2018, IEEE Signal Processing Letters.
[9] Jan Cernocký,et al. Improved feature processing for deep neural networks , 2013, INTERSPEECH.
[10] Srinivasan Umesh,et al. Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM , 2015, INTERSPEECH.
[11] Tatsuya Harada,et al. Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.
[12] Pietro Laface,et al. Adaptation of Artificial Neural Networks Avoiding Catastrophic Forgetting , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.
[13] Natalia A. Tomashenko,et al. GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models , 2015, INTERSPEECH.
[14] Yiming Wang,et al. Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs , 2018, IEEE Signal Processing Letters.
[15] Philip C. Woodland. Speaker adaptation for continuous density HMMs: a review , 2001 .
[16] Hank Liao,et al. Speaker adaptation of context dependent deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Pietro Laface,et al. Adaptation of Hybrid ANN/HMM Models Using Linear Hidden Transformations and Conservative Training , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[18] Steve Renals,et al. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[19] Khe Chai Sim,et al. Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems , 2010, INTERSPEECH.
[20] I-Fan Chen,et al. Feature space maximum a posteriori linear regression for adaptation of deep neural networks , 2014, INTERSPEECH.
[21] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[23] Yongqiang Wang,et al. Adaptation of deep neural network acoustic models using factorised i-vectors , 2014, INTERSPEECH.
[24] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[25] Yuuki Tachioka,et al. Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[26] I-Fan Chen,et al. Maximum a posteriori adaptation of network parameters in deep models , 2015, INTERSPEECH.
[27] Natalia A. Tomashenko,et al. Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing , 2014, INTERSPEECH.
[28] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Koichi Shinoda,et al. Speaker adaptation of deep neural networks using a hierarchy of output layers , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[30] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[31] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[32] Tatsuya Kawahara,et al. Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation , 2015, INTERSPEECH.
[33] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[34] Themos Stafylakis,et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yannick Estève,et al. On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models , 2016, INTERSPEECH.
[36] Hui Lin,et al. Deep neural networks with auxiliary Gaussian mixture models for real-time speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[37] Dmitry Popov,et al. An Investigation of Mixup Training Strategies for Acoustic Models in ASR , 2018, INTERSPEECH.
[38] Hui Jiang,et al. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[39] Sree Hari Krishnan Parthasarathi,et al. fMLLR based feature-space speaker adaptation of DNN acoustic models , 2015, INTERSPEECH.
[40] Yannick Estève,et al. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models , 2018, LREC.
[41] Andrew W. Senior,et al. Improving DNN speaker independence with I-vector inputs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[43] Mark J. F. Gales,et al. Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition , 2016, INTERSPEECH.
[44] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..