ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
暂无分享,去创建一个
[1] Earnings-22: A Practical Benchmark for Accents in the Wild , 2022, ArXiv.
[2] Patrick von Platen,et al. XTREME-S: Evaluating Cross-lingual Speech Representations , 2022, INTERSPEECH.
[3] Juan Pino,et al. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale , 2021, INTERSPEECH.
[4] Tara N. Sainath,et al. BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition , 2021, IEEE Journal of Selected Topics in Signal Processing.
[5] Jong Wook Kim,et al. Robust Speech Recognition via Large-Scale Weak Supervision , 2022, ICML.
[6] Vijay Janapa Reddi,et al. The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage , 2021, NeurIPS Datasets and Benchmarks.
[7] Alexander M. Rush,et al. Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.
[8] Javier Jorge,et al. Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization , 2021, Interspeech 2021.
[9] Xiangang Li,et al. GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio , 2021, Interspeech.
[10] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[11] Brian Kingsbury,et al. On the limit of English conversational speech recognition , 2021, Interspeech.
[12] Shinji Watanabe,et al. SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition , 2021, Interspeech.
[13] Mohammad Norouzi,et al. SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network , 2021, ArXiv.
[14] Ronny Krashinsky,et al. NVIDIA A100 Tensor Core GPU: Performance and Innovation , 2021, IEEE Micro.
[15] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[16] Michael Auli,et al. Multilingual Speech Translation from Efficient Finetuning of Pretrained Models , 2020, ACL.
[17] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.
[18] Pavel Golik,et al. How Might We Create Better Benchmarks for Speech Recognition? , 2021, BPPF.
[19] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[20] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[21] David Patterson,et al. A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.
[22] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[23] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.
[24] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[25] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[26] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[27] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[28] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[29] Jia Xin Koh,et al. Building the Singapore English National Speech Corpus , 2019, INTERSPEECH.
[30] Kate Knill,et al. Impact of ASR Performance on Spoken Grammatical Error Detection , 2019, INTERSPEECH.
[31] Boris Ginsburg,et al. NeMo: a toolkit for building AI applications using Neural Modules , 2019, ArXiv.
[32] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[33] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[34] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[35] Yu Wang,et al. Impact of ASR Performance on Free Speaking Language Assessment , 2018, INTERSPEECH.
[36] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[37] Yannick Estève,et al. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.
[38] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[39] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[40] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Philipp Koehn,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .
[43] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[44] Ted Briscoe,et al. Grammatical error correction using neural machine translation , 2016, NAACL.
[45] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[47] Cong Liu,et al. The USTC-iFlytek System for CHiME-4 Challenge , 2016 .
[48] D. Sculley,et al. Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.
[49] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[51] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[52] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[53] Philipp Koehn,et al. Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.
[54] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[55] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[56] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[57] Michiel Bacchiani,et al. Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[58] Richard M. Stern,et al. Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.
[59] Thomas Hain,et al. Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[60] Jean Carletta,et al. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.
[61] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[62] Ji-Hwan Kim,et al. A combined punctuation generation and speech recognition system and its performance enhancement using prosody , 2003, Speech Commun..
[63] Ji-Hwan Kim,et al. The use of prosody in a combined system for punctuation generation and speech recognition , 2001, INTERSPEECH.
[64] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[65] C. Julian Chen,et al. Speech recognition with automatic punctuation , 1999, EUROSPEECH.
[66] John D. Lafferty,et al. Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[67] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..
[68] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[69] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.