Learning Word-Level Confidence for Subword End-To-End ASR
暂无分享,去创建一个
Tara N. Sainath | Ian McGraw | Liangliang Cao | Wei Li | Yu Zhang | Rohit Prabhavalkar | Yanzhang He | Qiujia Li | David Qiu | Bo Li | Deepti Bhatia | Ke Hu | Rohit Prabhavalkar | Liangliang Cao | Ian McGraw | Yanzhang He | Yu Zhang | Qiujia Li | Wei Li | David Qiu | Ke Hu | Deepti Bhatia | Bo Li
[1] Tara N. Sainath,et al. N-best entropy based data selection for acoustic modeling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Roland Maas,et al. Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings , 2019, INTERSPEECH.
[3] Elena Voita,et al. BPE-Dropout: Simple and Effective Subword Regularization , 2020, ACL.
[4] Hermann Ney,et al. Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..
[5] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[6] Kaisheng Yao,et al. Estimating confidence scores on ASR results using recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[8] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Hui Jiang,et al. Confidence measures for speech recognition: A survey , 2005, Speech Commun..
[10] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[12] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[13] Yifan Gong,et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[14] Tara N. Sainath,et al. Deliberation Model Based Two-Pass End-To-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[16] Wei Li,et al. Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition , 2020, INTERSPEECH.
[17] Herbert Gish,et al. Improved estimation, evaluation and applications of confidence measures for speech recognition , 1997, EUROSPEECH.
[18] Liangliang Cao,et al. Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[20] Philip C. Woodland,et al. Combining Information Sources for Confidence Estimation with CRF Models , 2011, INTERSPEECH.
[21] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .
[22] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Ricky Ho Yin Chan,et al. Improving broadcast news transcription by lightly supervised discriminative training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[25] Mark J. F. Gales,et al. Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[30] L. Deng,et al. Calibration of Confidence Measures in Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[31] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[32] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[33] Gunnar Evermann,et al. Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[34] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[35] Mark J. F. Gales,et al. Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[36] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.
[37] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[38] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..
[39] Gökhan Tür,et al. Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..