Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition
暂无分享,去创建一个
[1] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.
[2] Themos Stafylakis,et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Hui Jiang,et al. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[4] Shinji Watanabe,et al. Sequence summarizing neural network for speaker adaptation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[6] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[7] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[10] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[11] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.
[12] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[13] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).
[14] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[15] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.
[16] Khe Chai Sim,et al. On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in DNN acoustic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[18] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[19] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[20] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[21] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[23] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] Steve Renals,et al. Multi-level adaptive networks in tandem and hybrid ASR systems , 2013, ICASSP.
[25] Steve Renals,et al. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[26] Satoshi Nakamura,et al. Stochastic Gradient Variational Bayes for deep learning-based ASR , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[27] Khe Chai Sim,et al. An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[29] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.