Utterance Invariant Training for Hybrid Two-Pass End-to-End Speech Recognition
暂无分享,去创建一个
Ankur Kumar | Hejung Yang | Chanwoo Kim | Abhinav Garg | Mehul Kumar | Kwangyoun Kim | Shatrughan Singh | Sachin Singh | Dhananjaya Gowda | Jiyeon Kim | Sichen Jin
[1] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[2] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[3] Sree Hari Krishnan Parthasarathi,et al. fMLLR based feature-space speaker adaptation of DNN acoustic models , 2015, INTERSPEECH.
[4] Puming Zhan,et al. Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR , 2019, INTERSPEECH.
[5] Kaisheng Yao,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yu Zhang,et al. Speaker adaptation using the i-vector technique for bottleneck features , 2015, INTERSPEECH.
[9] Andrew W. Senior,et al. Improving DNN speaker independence with I-vector inputs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Li Lee,et al. Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[11] Shinji Watanabe,et al. Auxiliary Feature Based Adaptation of End-to-end ASR Systems , 2018, INTERSPEECH.
[12] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[13] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[14] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[15] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[16] Jonathan Le Roux,et al. Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Sathish Reddy Indurthi,et al. Small Energy Masking for Improved Neural Network Training for End-To-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ankur Kumar,et al. Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios , 2020, INTERSPEECH.
[19] Daehyun Kim,et al. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Dhananjaya N. Gowda,et al. Power-Law Nonlinearity with Maximally Uniform Distribution Criterion for Improved Neural Network Training in Automatic Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Yashesh Gaur,et al. Speaker Adaptation for Attention-Based End-to-End Speech Recognition , 2019, INTERSPEECH.
[22] Yonghong Yan,et al. Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2019, INTERSPEECH.
[23] Shinji Watanabe,et al. Sequence summarizing neural network for speaker adaptation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Dhananjaya N. Gowda,et al. Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System , 2019, INTERSPEECH.
[25] Florian Metze,et al. Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[26] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[27] Young-Yoon Lee,et al. Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing , 2020, INTERSPEECH.
[28] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[29] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Dhananjaya N. Gowda,et al. End-to-End Training of a Large Vocabulary End-to-End Speech Recognition System , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Ankur Kumar,et al. Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[32] Hank Liao,et al. Speaker adaptation of context dependent deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[35] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[36] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[37] Themos Stafylakis,et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Chanwoo Kim,et al. Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition , 2020, INTERSPEECH.
[39] Yifan Gong,et al. Speaker Adaptation for End-to-End CTC Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).