Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
暂无分享,去创建一个
Qi Liu | Hao Li | Kai Yu | Yizhou Lu | Zhehuai Chen | Mingkun Huang | Kai Yu | Zhehuai Chen | Qi Liu | Hao Li | Yizhou Lu | Mingkun Huang
[1] H. Bourlard,et al. Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[2] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[3] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[4] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[6] Hervé Bourlard,et al. Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[7] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[8] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[9] Tao Xu,et al. Phone Synchronous Decoding with CTC Lattice , 2016, INTERSPEECH.
[10] Ting Liu,et al. Attention-over-Attention Neural Networks for Reading Comprehension , 2016, ACL.
[11] Qiang Huo,et al. A study on effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).
[12] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[13] Hervé Bourlard,et al. Neural networks for statistical recognition of continuous speech , 1995, Proc. IEEE.
[14] Hermann Ney,et al. Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[15] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[16] Tomoki Toda,et al. Back-Translation-Style Data Augmentation for end-to-end ASR , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[17] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Yu Hu,et al. Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency , 2015, ArXiv.
[19] Liang Lu,et al. On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Shinji Watanabe,et al. End-to-end Speech Recognition With Word-Based Rnn Language Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[21] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.
[22] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[24] Liang Lu,et al. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition , 2017, INTERSPEECH.
[25] Geoffrey Zweig,et al. Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Kai Yu,et al. Cluster adaptive training for deep neural network , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[29] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Florian Metze,et al. Acoustic-to-Word Recognition with Sequence-to-Sequence Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[31] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[32] Shinji Watanabe,et al. Multi-Modal Data Augmentation for End-to-end ASR , 2018, INTERSPEECH.
[33] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yanmin Qian,et al. Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[35] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[36] Chengzhu Yu,et al. A Multistage Training Framework for Acoustic-to-Word Model , 2018, INTERSPEECH.
[37] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[38] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Hao Li,et al. OOV Words Extension for Modular Neural Acoustics-to-Word Model , 2019 .
[40] Florian Metze,et al. An empirical exploration of CTC acoustic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[42] L. Baum,et al. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .
[43] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.
[44] Qi Liu,et al. On Modular Training of Neural Acoustics-to-Word Model for LVCSR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[46] Tom Bagby,et al. End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow , 2017, INTERSPEECH.
[47] L. Baum,et al. Growth transformations for functions on manifolds. , 1968 .
[48] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Brian Kingsbury,et al. Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.