Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR
暂无分享,去创建一个
Tatsuya Kawahara | Shinsuke Sakai | Hirofumi Inaguma | Masato Mimura | Tatsuya Kawahara | S. Sakai | H. Inaguma | M. Mimura
[1] Yifan Gong,et al. Advancing Acoustic-to-Word CTC Model , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[4] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[5] Yifan Gong,et al. Acoustic-to-word model without OOV , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[6] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[9] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[10] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[11] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[12] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[13] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[14] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[15] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[18] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[19] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[20] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[21] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[23] Liang Lu,et al. On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Yonatan Belinkov,et al. Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems , 2017, NIPS.
[26] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[27] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[28] K. Maekawa. CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .
[29] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Tatsuya Kawahara,et al. Acoustic-to-Word Attention-Based Model Complemented with Character-Level CTC-Based Model , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Brian Kingsbury,et al. Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[33] Andreas Stolcke,et al. Comparing Human and Machine Errors in Conversational Speech Transcription , 2017, INTERSPEECH.
[34] Liang Lu,et al. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition , 2017, INTERSPEECH.
[35] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Bhuvana Ramabhadran,et al. Empirical Exploration of Novel Architectures and Objectives for Language Models , 2017, INTERSPEECH.
[37] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[38] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.