High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
暂无分享,去创建一个
Zhong Meng | Yifan Gong | Rui Zhao | Jinyu Li | Amit Das | Jeremy H. M. Wong | Eric Sun | Jinyu Li | Y. Gong | Rui Zhao | J. H. M. Wong | Amit Das | Zhong Meng | Eric Sun
[1] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[3] Dong Yu,et al. Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.
[4] Yifan Gong,et al. Layer Trajectory LSTM , 2018, INTERSPEECH.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Thomas Hain,et al. Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition , 2006, INTERSPEECH.
[7] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Yu Zhang,et al. A prioritized grid long short-term memory RNN for speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[9] Chengyi Wang,et al. Semantic Mask for Transformer based End-to-End Speech Recognition , 2020, INTERSPEECH.
[10] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Yu Wang,et al. General Sequence Teacher–Student Learning , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12] Liang Lu,et al. Improving Layer Trajectory LSTM with Future Context Frames , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Jonathan Le Roux,et al. Triggered Attention for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[15] Geoffrey Zweig,et al. Exploring multidimensional lstms for large vocabulary ASR , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Yifan Gong,et al. Layer Trajectory BLSTM , 2019, INTERSPEECH.
[17] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[18] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Yongqiang Wang,et al. Simplifying long short-term memory acoustic models for fast training and decoding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Alex Graves,et al. Grid Long Short-Term Memory , 2015, ICLR.
[21] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[22] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[23] Matt Shannon,et al. Optimizing Expected Word Error Rate via Sampling for Speech Recognition , 2017, INTERSPEECH.
[24] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.
[25] Naoyuki Kanda,et al. Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Mark J. F. Gales,et al. Sequence Student-Teacher Training of Deep Neural Networks , 2016, INTERSPEECH.
[27] Sanjeev Khudanpur,et al. A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[28] Shuang Xu,et al. Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling , 2016, INTERSPEECH.
[29] Geoffrey Zweig,et al. LSTM time and frequency recurrence for automatic speech recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[30] Shinji Watanabe,et al. Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration , 2019, INTERSPEECH.
[31] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[32] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..
[33] Jungwon Lee,et al. Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition , 2017, INTERSPEECH.
[34] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[35] Tara N. Sainath,et al. Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks , 2016, INTERSPEECH.