暂无分享,去创建一个
Frank Zhang | Julian Chan | Fuchun Peng | Yatharth Saraf | Vimal Manohar | Yangyang Shi | Nayan Singhal | Xiaohui Zhang | Mike Seltzer | David Zhang | M. Seltzer | Xiaohui Zhang | Fuchun Peng | Vimal Manohar | Yatharth Saraf | Nayan Singhal | David Zhang | Frank Zhang | Yangyang Shi | Julian Chan
[1] Dong Yu,et al. A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-trained Neural Network Acoustic Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[3] Brian Kingsbury,et al. Discriminative feature-space transforms using deep neural networks , 2012, INTERSPEECH.
[4] Kai Yu,et al. Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting , 2018, Speech Commun..
[5] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[7] Shinji Watanabe,et al. Promising Accurate Prefix Boosting for Sequence-to-sequence ASR , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Geoffrey Zweig,et al. Improving RNN Transducer Based ASR with Auxiliary Tasks , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[9] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[10] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[11] Geoffrey Zweig,et al. Benchmarking LF-MMI, CTC And RNN-T Criteria For Streaming ASR , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[12] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[13] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .
[14] Sanjeev Khudanpur,et al. Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Sanjeev Khudanpur,et al. Pronunciation and silence probability modeling for ASR , 2015, INTERSPEECH.
[16] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[17] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Gil Keren,et al. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion , 2021, Interspeech 2021.
[19] Sanjeev Khudanpur,et al. Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[21] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[22] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[24] Sanjeev Khudanpur,et al. Semi-supervised maximum mutual information training of deep neural network acoustic models , 2015, INTERSPEECH.
[25] Noah A. Smith,et al. Softmax-Margin Training for Structured Log-Linear Models , 2010 .
[26] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Dong Yu,et al. Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[29] Johan Schalkwyk,et al. Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Gil Keren,et al. Alignment Restricted Streaming Recurrent Neural Network Transducer , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[31] G. Zweig,et al. Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces , 2020, INTERSPEECH.
[32] Lukás Burget,et al. Semi-supervised training of Deep Neural Networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[33] Hermann Ney,et al. Why does CTC result in peaky behavior? , 2021, ArXiv.
[34] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Geoffrey Zweig,et al. From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[36] Shinji Watanabe,et al. Using ASR Methods for OCR , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[37] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[38] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[39] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[40] Sanjeev Khudanpur,et al. PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR , 2020, INTERSPEECH.
[41] Zhijian Ou,et al. CRF-based Single-stage Acoustic Modeling with CTC Topology , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Frank Zhang,et al. Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition , 2020, ArXiv.