Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
暂无分享,去创建一个
[1] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[2] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[3] Ming Li,et al. From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint , 2020, INTERSPEECH.
[4] Horia Cucu,et al. Kaldi-based DNN Architectures for Speech Recognition in Romanian , 2019, 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).
[5] Wei Ping,et al. Multi-Speaker End-to-End Speech Synthesis , 2019, ArXiv.
[6] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[7] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[8] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[10] Xin Wang,et al. Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[12] Shinji Watanabe,et al. Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Thomas Merritt,et al. Low-resource expressive text-to-speech using data augmentation , 2020 .
[14] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[15] Tao Qin,et al. MultiSpeech: Multi-Speaker Text to Speech with Transformer , 2020, INTERSPEECH.
[16] Shinji Watanabe,et al. Learning Speaker Embedding from Text-to-Speech , 2020, INTERSPEECH.
[17] Bogdan Orza,et al. The SWARA speech corpus: A large parallel Romanian read speech dataset , 2017, 2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).
[18] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[19] Bryan Catanzaro,et al. Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis , 2021, ICLR.
[20] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[21] Joon Son Chung,et al. In defence of metric learning for speaker recognition , 2020, INTERSPEECH.
[22] Seong-Whan Lee,et al. Mel-spectrogram augmentation for sequence to sequence voice conversion , 2020, ArXiv.
[23] Zhaoyu Liu,et al. Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers , 2019, 1911.11601.
[24] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[25] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[26] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[27] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.