暂无分享,去创建一个
[1] Steve Renals,et al. Word Error Rate Estimation for Speech Recognition: e-WER , 2018, ACL.
[2] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[3] Stephan Vogel,et al. Speech recognition challenge in the wild: Arabic MGB-3 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[4] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[5] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[6] Kaare Brandt Petersen,et al. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.
[7] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Mark J. F. Gales,et al. The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[9] Ahmed Mohamed Abdel Maksoud Ali,et al. Multi-dialect Arabic broadcast speech recognition , 2018 .
[10] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[11] Mikko Kurimo,et al. Aalto system for the 2017 Arabic multi-genre broadcast challenge , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[13] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[14] Steve Renals,et al. Revisiting hybrid and GMM-HMM system combination techniques , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] James R. Glass,et al. The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[16] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[17] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[18] Yu Wang,et al. PHONETIC AND GRAPHEMIC SYSTEMS FOR MULTI-GENRE BROADCAST TRANSCRIPTION , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Sanjeev Khudanpur,et al. Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[21] Sameer Khurana,et al. QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[22] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[24] James R. Glass,et al. Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition , 2018, Odyssey.
[25] J.H.L. Hansen,et al. High resolution speech feature parametrization for monophone-based stressed speech recognition , 2000, IEEE Signal Processing Letters.
[26] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.