YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
暂无分享,去创建一个
[1] João Paulo Teixeira,et al. TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese , 2020, Lang. Resour. Evaluation.
[2] Juheon Lee,et al. Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations , 2021, NeurIPS.
[3] Brejesh Lall,et al. Normalization Driven Zero-Shot Multi-Speaker Speech Synthesis , 2021, Interspeech.
[4] Tomoki Koriyama,et al. Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis , 2021, Interspeech.
[5] Tao Qin,et al. A Survey on Neural Speech Synthesis , 2021, ArXiv.
[6] Jungil Kong,et al. Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech , 2021, ICML.
[7] Qingyang Hong,et al. Light-TTS: Lightweight Multi-Speaker Multi-Lingual Text-to-Speech , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Damian Borth,et al. NoiseVC: Towards High Quality Zero-Shot Voice Conversion , 2021, ArXiv.
[9] Sandra M. Aluísio,et al. SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model , 2021, Interspeech.
[10] Xiang Hao,et al. Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[12] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[13] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[14] Joon Son Chung,et al. Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020 , 2020, ArXiv.
[15] Ondrej Dusek,et al. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech , 2020, INTERSPEECH.
[16] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[17] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[18] Seungju Han,et al. Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding , 2020, INTERSPEECH.
[19] João Paulo Teixeira,et al. End-To-End Speech Synthesis Applied to Brazilian Portuguese , 2020, ArXiv.
[20] Joon Son Chung,et al. In defence of metric learning for speaker recognition , 2020, INTERSPEECH.
[21] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.
[22] J. Yamagishi,et al. Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[25] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[26] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[27] Jianwei Yu,et al. End-to-end Code-switched TTS with Mix of Monolingual Recordings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2019 .
[29] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[30] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[32] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[33] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[34] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Ming Li,et al. Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System , 2018, Odyssey.
[36] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[37] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[40] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[41] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[42] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[43] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[44] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[45] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).