Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings
暂无分享,去创建一个
[1] Yashesh Gaur,et al. Speaker Adaptation for Attention-Based End-to-End Speech Recognition , 2019, INTERSPEECH.
[2] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[3] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[5] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[6] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[7] Steve Renals,et al. Embeddings for DNN Speaker Adaptive Training , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[8] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] M. Cugmas,et al. On comparing partitions , 2015 .
[10] Sanjeev Khudanpur,et al. End-to-end Speech Recognition Using Lattice-free MMI , 2018, INTERSPEECH.
[11] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[13] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Lukás Burget,et al. Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] John H. L. Hansen,et al. Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.
[17] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[19] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Sanjeev Khudanpur,et al. Probing the Information Encoded in X-Vectors , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[22] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[23] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.
[24] Xiaodong Cui,et al. Embedding-Based Speaker Adaptive Training of Deep Neural Networks , 2017, INTERSPEECH.
[25] Yannick Estève,et al. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models , 2018, LREC.
[26] Shinji Watanabe,et al. Auxiliary Feature Based Adaptation of End-to-end ASR Systems , 2018, INTERSPEECH.