Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
暂无分享,去创建一个
Jianwei Yu | Shoukang Hu | Helen Meng | Xunying Liu | Shansong Liu | Xurong Xie | Mengzhe Geng | Zi Ye
[1] Jianwei Yu,et al. Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[3] Alan F. Murray,et al. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training , 1994, IEEE Trans. Neural Networks.
[4] Jianwei Yu,et al. LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition , 2019, INTERSPEECH.
[5] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] John Makhoul,et al. Speaker adaptive training: a maximum likelihood approach to speaker normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] Shun-ichi Amari,et al. Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.
[9] Sanjeev Khudanpur,et al. Investigation of transfer learning for ASR using LF-MMI trained neural networks , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[10] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[11] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[12] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[13] Yashesh Gaur,et al. On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition , 2020, INTERSPEECH.
[14] Brian Kingsbury,et al. Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300 , 2020, INTERSPEECH.
[15] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[16] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .
[17] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[18] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[20] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[21] Hagen Soltau,et al. Fast speaker adaptive training for speech recognition , 2008, INTERSPEECH.
[22] Yiming Wang,et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.
[23] Zdravko Kacic,et al. A novel loss function for the overall risk criterion based discriminative training of HMM models , 2000, INTERSPEECH.
[24] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[25] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[26] Tara N. Sainath,et al. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.
[27] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[29] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[30] Jianwei Yu,et al. Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models , 2019, INTERSPEECH.
[31] Sanjeev Khudanpur,et al. End-to-end Speech Recognition Using Lattice-free MMI , 2018, INTERSPEECH.
[32] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[33] Geoffrey Zweig,et al. Transformer-Based Acoustic Modeling for Hybrid Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[35] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[36] Sanjeev Khudanpur,et al. A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Jen-Tzung Chien,et al. Variational Recurrent Neural Networks for Speech Separation , 2017, INTERSPEECH.
[38] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[39] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..
[40] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Hermann Ney,et al. Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[43] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[44] Jen-Tzung Chien,et al. Bayesian Recurrent Neural Network for Language Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[45] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[46] Shih-Chii Liu,et al. Parameter Uncertainty for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Hermann Ney,et al. Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR , 2019, INTERSPEECH.
[48] Brian Kingsbury,et al. Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[49] Hermann Ney,et al. Cumulative Adaptation for BLSTM Acoustic Models , 2019, INTERSPEECH.
[50] Steve J. Young,et al. MMIE training of large vocabulary recognition systems , 1997, Speech Communication.
[51] J. Becker,et al. The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. , 1994, Archives of neurology.
[52] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[53] Xiaohui Zhang,et al. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.
[54] Thomas Hain,et al. Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition , 2006, INTERSPEECH.
[55] Xu Li,et al. Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification , 2020, Odyssey.
[56] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[57] Alex Waibel,et al. Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[58] Chao Zhang,et al. Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling , 2015, INTERSPEECH.
[59] Dong Yu,et al. Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[60] Sanjeev Khudanpur,et al. A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[62] Hermann Ney,et al. The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[63] Subhadeep Dey,et al. Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit , 2016 .
[64] Tara N. Sainath,et al. Parallel Deep Neural Network Training for Big Data on Blue Gene/Q , 2017, IEEE Transactions on Parallel and Distributed Systems.
[65] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[66] Steve Renals,et al. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[67] Mark J. F. Gales,et al. The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation , 2015, INTERSPEECH.
[68] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[69] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[70] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[71] Tara N. Sainath,et al. Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[72] Georg Heigold,et al. Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[73] Jianwei Yu,et al. Gaussian Process Neural Networks for Speech Recognition , 2018, INTERSPEECH.