SUPERB: Speech processing Universal PERformance Benchmark
暂无分享,去创建一个
Andy T. Liu | Yist Y. Lin | Abdel-rahman Mohamed | Hung-yi Lee | Shinji Watanabe | Shang-Wen Li | Po-Han Chi | Shu-wen Yang | Da-Rong Liu | Yung-Sung Chuang | W. Tseng | Cheng-I Lai | Jiatong Shi | Zili Huang | Guan-Ting Lin | Xuankai Chang | Kushal Lakhotia | Shuyan Dong | Tzu-hsien Huang | Ko-tik Lee
[1] Hung-yi Lee,et al. Fragmentvc: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Shang-Wen Li,et al. TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Florian Metze,et al. QUESST2014: Evaluating Query-by-Example Speech Search in a zero-resource setting with real-life queries , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Hung-yi Lee,et al. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Aren Jansen,et al. Towards Learning a Universal Non-Semantic Representation of Speech , 2020, INTERSPEECH.
[6] Francesco Caltagirone,et al. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.
[7] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[9] James R. Glass,et al. Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies , 2020, Interspeech.
[10] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[11] Ewan Dunbar,et al. The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling , 2020, ArXiv.
[12] Jia Deng,et al. How Useful Is Self-Supervised Pretraining for Visual Tasks? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Yoshua Bengio,et al. Multi-Task Self-Supervised Learning for Robust Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[15] Shang-Wen Li,et al. Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[19] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[20] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Toni Giorgino,et al. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .
[22] Yannick Estève,et al. Recent Advances in End-to-End Spoken Language Understanding , 2019, SLSP.
[23] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[24] James R. Glass,et al. Vector-Quantized Autoregressive Predictive Coding , 2020, INTERSPEECH.
[25] Antoine Deleforge,et al. LibriMix: An Open-Source Dataset for Generalizable Speech Separation , 2020, 2005.11262.
[26] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[27] Yuzong Liu,et al. DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization , 2020, ArXiv.
[28] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[29] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[30] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[31] Mireia Díez,et al. GTTS-EHU Systems for QUESST at MediaEval 2014 , 2014, MediaEval.
[32] Meng Li,et al. Exploring wav2vec 2.0 on speaker verification and language identification , 2020, Interspeech.
[33] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[34] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[35] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[36] Antonio Bonafonte,et al. Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN , 2019, ArXiv.
[37] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Naoyuki Kanda,et al. End-to-End Neural Speaker Diarization with Permutation-Free Objectives , 2019, INTERSPEECH.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[43] Joon Son Chung,et al. Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..
[44] WAV 2 VEC 2 . 0 ON SPEAKER VERIFICATION AND LANGUAGE IDENTIFICATION , 2021 .