Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data
暂无分享,去创建一个
[1] Jiajun Zhang,et al. Synchronous Bidirectional Neural Machine Translation , 2019, TACL.
[2] Mark J. F. Gales,et al. Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Jianhua Tao,et al. CLMAD: A Chinese Language Model Adaptation Dataset , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[4] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[5] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[6] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[7] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Jiangyan Yi,et al. Forward–Backward Decoding Sequence for Regularizing End-to-End TTS , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Yifan Gong,et al. Large-Scale Domain Adaptation via Teacher-Student Learning , 2017, INTERSPEECH.
[10] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Yifan Gong,et al. Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.
[13] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[17] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[18] John W. Merrill,et al. Automatic Speech Recognition , 2005 .
[19] Ronald Rosenfeld,et al. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..
[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[21] Maosong Sun,et al. Scalable Term Selection for Text Categorization , 2007, EMNLP.
[22] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[23] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[24] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] José-Miguel Benedí,et al. Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features , 2001, ACL.
[26] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[28] Stanley F. Chen,et al. Shrinking Exponential Language Models , 2009, NAACL.
[29] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.
[30] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[31] Shiyu Zhou,et al. Unsupervised pre-traing for sequence to sequence speech recognition , 2019, ArXiv.
[32] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[33] Enhong Chen,et al. Regularizing Neural Machine Translation by Target-bidirectional Agreement , 2018, AAAI.
[34] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[35] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Tatsuya Kawahara,et al. Forward-Backward Attention Decoder , 2018, INTERSPEECH.
[37] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[38] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[39] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[40] Hermann Ney,et al. A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Yu Zhang,et al. On training bi-directional neural network language model with noise contrastive estimation , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[42] Rich Caruana,et al. Model compression , 2006, KDD '06.
[43] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[44] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[45] Mei-Yuh Hwang,et al. Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[46] Bin Wang,et al. Learning Trans-Dimensional Random Fields with Applications to Language Modeling , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Björn W. Schuller,et al. Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis , 2017, EACL.
[48] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[51] Ebru Arisoy,et al. Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Zhijian Ou,et al. CAT: CRF-based ASR Toolkit , 2019, ArXiv.
[53] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[54] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[55] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.
[56] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[57] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[58] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Mark J. F. Gales,et al. Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.
[60] Shinji Watanabe,et al. Recent Developments on Espnet Toolkit Boosted By Conformer , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[62] Jesús Andrés-Ferrer,et al. Efficient Language Model Adaptation with Noise Contrastive Estimation and Kullback-Leibler Regularization , 2018, INTERSPEECH.
[63] Bohyung Han,et al. Learning to Specialize with Knowledge Distillation for Visual Question Answering , 2018, NeurIPS.
[64] Dong Yu,et al. Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Hermann Ney,et al. Language Modeling with Deep Transformers , 2019, INTERSPEECH.
[66] Anthony Rousseau,et al. XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.
[67] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Jiangyan Yi,et al. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition , 2019, INTERSPEECH.