Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
暂无分享,去创建一个
Jinyu Li | Huaming Wang | Furu Wei | Zhuo Chen | Lei He | Yu Wu | Shujie Liu | Sheng Zhao | Yanqing Liu | Sanyuan Chen | Chengyi Wang | Long Zhou | Zi-Hua Zhang
[1] Jinyu Li,et al. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers , 2023, ArXiv.
[2] C. Chiu,et al. Textless Direct Speech-to-Speech Translation with Discrete Speech Representation , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Jinyu Li,et al. Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Chao Weng,et al. Diffsound: Discrete Diffusion Model for Text-to-Sound Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] Ming Li,et al. Cross-lingual multi-speaker speech synthesis with limited bilingual training data , 2022, Comput. Speech Lang..
[6] Jingfei Du,et al. SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations , 2022, ArXiv.
[7] Joun Yeop Lee,et al. An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space , 2022, ArXiv.
[8] June Sig Sung,et al. Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation , 2022, ArXiv.
[9] Gabriel Synnaeve,et al. High Fidelity Neural Audio Compression , 2022, ArXiv.
[10] Jinyu Li,et al. SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training , 2022, EMNLP.
[11] Yaniv Taigman,et al. AudioGen: Textually Guided Audio Generation , 2022, ICLR.
[12] Jinyu Li,et al. SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data , 2022, ArXiv.
[13] Tanja Schultz,et al. Normalization of code-switched text for speech synthesis , 2022, INTERSPEECH.
[14] David Grangier,et al. AudioLM: A Language Modeling Approach to Audio Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] Yi Ren,et al. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation , 2022, ICLR.
[16] Tie-Yan Liu,et al. NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Tao Wang,et al. GigaST: A 10, 000-hour Pseudo Speech Translation Corpus , 2022, ArXiv.
[18] Lei He,et al. Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training , 2022, ArXiv.
[19] Michelle Tadmor Ramanovich,et al. CVSS Corpus and Massively Multilingual Speech-to-Speech Translation , 2022, LREC.
[20] H. Schwenk,et al. Textless Speech-to-Speech Translation on Real Data , 2021, NAACL.
[21] Arnaldo Cândido Júnior,et al. YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone , 2021, ICML.
[22] Jinyu Li,et al. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing , 2021, IEEE Journal of Selected Topics in Signal Processing.
[23] Lei Xie,et al. WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Michelle Tadmor Ramanovich,et al. Translatotron 2: High-quality direct speech-to-speech translation with voice preservation , 2021, ICML.
[25] A. Polyak,et al. Direct Speech-to-Speech Translation With Discrete Units , 2021, ACL.
[26] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Chung-Cheng Chiu,et al. w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[29] Xiangang Li,et al. GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio , 2021, Interspeech.
[30] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[31] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[32] Michelle Tadmor Ramanovich,et al. Translatotron 2: Robust direct speech-to-speech translation , 2021, ArXiv.
[33] Sebastian Möller,et al. Deep Learning Based Assessment of Synthetic Speech Naturalness , 2020, INTERSPEECH.
[34] Jingzhou Yang,et al. Towards Universal Text-to-Speech , 2020, INTERSPEECH.
[35] Brian Kan-Wing Mak,et al. Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment , 2020, INTERSPEECH.
[36] Shengkui Zhao,et al. Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion , 2020, INTERSPEECH.
[37] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[38] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[39] Songxiang Liu,et al. Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Chunghyun Ahn,et al. Emotional Speech Synthesis with Rich and Granularized Control , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[43] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[44] Melvin Johnson,et al. Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.
[45] Lior Wolf,et al. Unsupervised Polyglot Text-to-speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[47] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[48] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[49] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] M. Wester. The EMIME Bilingual Database , 2010 .
[51] Satoshi Nakamura,et al. The ATR Multilingual Speech-to-Speech Translation System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[52] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[53] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.
[54] Alon Lavie,et al. Janus-III: speech-to-speech translation in multiple languages , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.