Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..
[2] Hui Jiang,et al. Confidence measures for speech recognition: A survey , 2005, Speech Commun..
[3] Hermann Ney,et al. Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..
[4] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[6] Atsushi Nakamura,et al. Real-time one-pass decoding with recurrent neural network language model for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Björn W. Schuller,et al. Social signal classification using deep blstm recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Vaibhava Goel,et al. Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..
[9] Yoshua Bengio,et al. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.
[10] Geoffrey Zweig,et al. Recurrent neural networks for language understanding , 2013, INTERSPEECH.
[11] James R. Glass,et al. Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[12] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[13] Philip C. Woodland,et al. Detecting deletions in ASR output , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Atsunori Ogawa,et al. ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Hermann Ney,et al. Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.
[16] Atsunori Ogawa,et al. Error type classification and word accuracy estimation using alignment features from word confusion network , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Hung-An Chang,et al. Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[19] Yun Lei,et al. ASR error detection using recurrent neural network language model and complementary ASR , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..
[21] Ralf Schlüter,et al. Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[22] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[23] Atsunori Ogawa,et al. Unsupervised discriminative language modeling using error rate estimator , 2013, INTERSPEECH.
[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[25] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[26] Shinji Watanabe,et al. Discriminative training based on an integrated view of MPE and MMI in margin and error space , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Lidia Mangu,et al. Finding consensus in speech recognition , 2000 .
[28] James R. Glass,et al. Recent progress in the MIT spoken lecture processing project , 2007, INTERSPEECH.
[29] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[30] Haibo He,et al. Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.
[31] Hynek Hermansky,et al. Posterior-based out of vocabulary word detection in telephone speech , 2009, INTERSPEECH.
[32] Atsunori Ogawa,et al. Discriminative recognition rate estimation for N-best list and its application to N-best rescoring , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Tatsuya Kawahara. Benchmark test for speech recognition using the Corpus of Spontaneous Japanese , 2003 .
[34] Geoffrey Zweig,et al. Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Atsushi Nakamura,et al. Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[36] Thomas Kemp,et al. Modelling unknown words in spontaneous speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[37] Mari Ostendorf,et al. Using syntactic and confusion network structure for out-of-vocabulary word detection , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[38] Atsunori Ogawa,et al. Estimating Speech Recognition Accuracy Based on Error Type Classification , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[39] K. Maekawa. CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .
[40] Thomas Schaaf,et al. Confidence measures for spontaneous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[41] Mark Dredze,et al. Contextual Information Improves OOV Detection in Speech , 2010, NAACL.
[42] Thomas Schaaf,et al. Estimating confidence using word lattices , 1997, EUROSPEECH.
[43] Hynek Hermansky,et al. Combination of strongly and weakly constrained recognizers for reliable detection of OOVS , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[44] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..
[45] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[46] Mitch Weintraub,et al. Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[47] Philip C. Woodland,et al. Combining Information Sources for Confidence Estimation with CRF Models , 2011, INTERSPEECH.
[48] Andreas Stolcke,et al. Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.
[49] Kaisheng Yao,et al. Estimating confidence scores on ASR results using recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] L. Deng,et al. Calibration of Confidence Measures in Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[51] Guillaume Gravier,et al. Is it time to Switch to word embedding and recurrent neural networks for spoken language understanding? , 2015, INTERSPEECH.
[52] Thomas Schaaf. Detection of OOV words using generalized word models and a semantic class language model , 2001, INTERSPEECH.
[53] Patrick Gros,et al. CRF-based combination of contextual features to improve a posteriori word-level confidence measures , 2010, INTERSPEECH.
[54] Gunnar Evermann,et al. Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[55] Hermann Ney,et al. On the Relationship Between Bayes Risk and Word Error Rate in ASR , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Shinji Watanabe,et al. Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[57] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[58] Atsunori Ogawa,et al. Recognition rate estimation based on word alignment network and discriminative error type classification , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).