Scaling Speech Technology to 1, 000+ Languages
暂无分享,去创建一个
Alexis Conneau | Michael Auli | Alexei Baevski | A. Elkahky | Yossi Adi | Paden Tomasello | Vineel Pratap | Wei-Ning Hsu | Zhaoheng Ni | Sayani Kundu | Andros Tjandra | Apoorv Vyas | Bowen Shi | Xiaohui Zhang | Alexis Conneau | Arun Babu | Bowen Shi | Maryam Fazel-Zarandi | Vineel Pratap | Andros Tjandra | Paden Tomasello | Arun Babu | Sayani Kundu | Ali Elkahky | Zhaoheng Ni | Apoorv Vyas | Maryam Fazel-Zarandi | Alexei Baevski | Yossi Adi | Wei-Ning Hsu | Michael Auli | Xiaohui Zhang
[1] Jinyu Li,et al. Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling , 2023, ArXiv.
[2] Tara N. Sainath,et al. Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages , 2023, ArXiv.
[3] Shinji Watanabe,et al. Improving Massively Multilingual ASR With Auxiliary CTC Objectives , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] H. Saruwatari,et al. Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining , 2023, ArXiv.
[5] M. Seltzer,et al. Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] H. Zen,et al. Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] A. Conneau,et al. FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[8] Alexander H. Liu,et al. Towards End-to-End Unsupervised Speech Recognition , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[9] Jong Wook Kim,et al. Robust Speech Recognition via Large-Scale Weak Supervision , 2022, ICML.
[10] Nithin Rao Koluguri,et al. AmberNet: A Compact End-to-End Model for Spoken Language Identification , 2022, ArXiv.
[11] Daniel Whitenack,et al. Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks , 2022, EMNLP.
[12] David R. Mortensen,et al. ASR2K: Speech Recognition for Around 2000 Languages without Audio , 2022, INTERSPEECH.
[13] David Ifeoluwa Adelani,et al. BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus , 2022, INTERSPEECH.
[14] Z. Chen,et al. Building Machine Translation Systems for the Next Thousand Languages , 2022, ArXiv.
[15] H. Zen,et al. MAESTRO: Matched Speech Text Representations through Modality Matching , 2022, INTERSPEECH.
[16] Ankur Bapna,et al. mSLAM: Massively multilingual joint pre-training for speech and text , 2022, ArXiv.
[17] Ronan Collobert,et al. Flashlight: Enabling Innovation in Tools for Machine Learning , 2022, ICML.
[18] Simon J. Greenhill,et al. Global predictors of language endangerment and the future of linguistic diversity , 2021, Nature Ecology & Evolution.
[19] Arnaldo Cândido Júnior,et al. YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone , 2021, ICML.
[20] Juan Pino,et al. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale , 2021, INTERSPEECH.
[21] Mitesh M. Khapra,et al. Towards Building ASR Systems for the Next Billion Users , 2021, AAAI.
[22] Ronan Collobert,et al. Pseudo-Labeling for Massively Multilingual Speech Recognition , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Edward Z. Yang,et al. Torchaudio: Building Blocks for Audio and Speech Processing , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Diptanu Gon Choudhury,et al. Improved Language Identification Through Cross-Lingual Self-Supervised Learning , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Ronan Collobert,et al. Star Temporal Classification: Sequence Modeling with Partially Labeled Data , 2022, Neural Information Processing Systems.
[26] Kenneth Ward Church,et al. W-CTC: a Connectionist Temporal Classification Loss with Wild Cards , 2022, ICLR.
[27] Tao Qin,et al. A Survey on Neural Speech Synthesis , 2021, ArXiv.
[28] Jungil Kong,et al. Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech , 2021, ICML.
[29] Titouan Parcollet,et al. SpeechBrain: A General-Purpose Speech Toolkit , 2021, ArXiv.
[30] Michael Auli,et al. Unsupervised Speech Recognition , 2021, NeurIPS.
[31] Tara N. Sainath,et al. Scaling End-to-End Models for Large-Scale Multilingual ASR , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[32] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Mohammad Norouzi,et al. SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network , 2021, ArXiv.
[34] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[35] F. Soong,et al. Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis , 2021, ArXiv.
[36] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[37] Meng Li,et al. Exploring wav2vec 2.0 on speaker verification and language identification , 2020, Interspeech.
[38] Jorgen Valk,et al. VOXLINGUA107: A Dataset for Spoken Language Recognition , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[39] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.
[40] Ronan Collobert,et al. Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.
[41] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[42] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[43] Tian Huey Teh,et al. Phonological Features for 0-shot Multilingual Speech Synthesis , 2020, INTERSPEECH.
[44] Ondrej Dusek,et al. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech , 2020, INTERSPEECH.
[45] Lujun Li,et al. CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition , 2020, SPECOM.
[46] Gabriel Synnaeve,et al. Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters , 2020, INTERSPEECH.
[47] Gabriel Synnaeve,et al. Real Time Speech Enhancement in the Waveform Domain , 2020, INTERSPEECH.
[48] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[49] David Yarowsky,et al. The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration , 2020, LREC.
[50] Dan Jurafsky,et al. Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.
[51] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[52] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[53] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.
[54] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[55] Michael Auli,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[56] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.
[57] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[58] Alan W. Black,et al. CMU Wilderness Multilingual Speech Dataset , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[60] Gabriel Synnaeve,et al. Who Needs Words? Lexicon-Free Speech Recognition , 2019, INTERSPEECH.
[61] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[62] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[63] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[64] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[66] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[67] Shinji Watanabe,et al. Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[68] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[69] Kevin Knight,et al. Out-of-the-box Universal Romanization Tool uroman , 2018, ACL.
[70] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[71] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[72] Tara N. Sainath,et al. Multilingual Speech Recognition with a Single End-to-End Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Jean Carrive,et al. INA ’ S MIREX 2018 MUSIC AND SPEECH DETECTION SYSTEM , 2018 .
[74] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[75] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[76] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[77] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[78] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[79] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[80] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[81] Mark Steedman,et al. A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.
[82] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[83] Mark J. F. Gales,et al. Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED , 2014, SLTU.
[84] Simon Dixon,et al. PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[85] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[86] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[87] Philip N. Garner,et al. Current trends in multilingual speech processing , 2011 .
[88] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[89] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[90] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[91] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[92] Kai Feng,et al. Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[93] Su-Youn Yoon,et al. A Python Toolkit for Universal Transliteration , 2010, LREC.
[94] Hui Lin,et al. A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[95] Su-Youn Yoon,et al. Multilingual Transliteration Using Feature based Phonetic Method , 2007, ACL.
[96] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[97] Sarah L. Nesbeitt. Ethnologue: Languages of the World , 1999 .
[98] John C. Wells,et al. Computer-coding the IPA: a proposed extension of SAMPA , 1995 .
[99] Worldbet,et al. ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .