On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
暂无分享,去创建一个
[1] Hao Li,et al. Data Augmentation for end-to-end Code-Switching Speech Recognition , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[2] Kyuyeon Hwang,et al. SapAugment: Learning A Sample Adaptive Policy for Data Augmentation , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] J. Pino,et al. Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq , 2020, AACL.
[4] Kyunghyun Cho,et al. SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness , 2020, EMNLP.
[5] Sergey Rybin,et al. You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation , 2020, 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).
[6] H. Ney,et al. Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] A. Waibel,et al. Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Awni Y. Hannun,et al. Self-Training for End-to-End Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[10] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[11] Elizabeth Salesky,et al. Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation , 2019, ACL.
[12] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[14] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[15] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[16] Shinji Watanabe,et al. Low Resource Multi-modal Data Augmentation for End-to-end ASR , 2018, ArXiv.
[17] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[18] Graham Neubig,et al. SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.
[19] Davis Liang,et al. Learning Noise-Invariant Representations for Robust Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[20] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[21] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[22] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[23] Christof Monz,et al. Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.
[24] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[25] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[26] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.
[28] Hermann Ney,et al. Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[29] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[30] Eric W. Noreen,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .
[31] Stephen Cox,et al. Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[32] W. Hoeffding. The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .
[33] J. I. The Design of Experiments , 1936, Nature.
[34] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[35] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[36] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.