暂无分享,去创建一个
Seungju Han | Seungwoo Choi | Sungjoo Ha | Dongyoung Kim | Dongyoung Kim | Seungju Han | Seungwoo Choi | S. Ha
[1] Xin Wang,et al. Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Wei Ping,et al. Multi-Speaker End-to-End Speech Synthesis , 2019, ArXiv.
[3] Lior Wolf,et al. Fitting New Speakers Based on a Short Untranscribed Sample , 2018, ICML.
[4] Slava Shechtman,et al. High quality, lightweight and adaptable TTS using LPCNet , 2019, INTERSPEECH.
[5] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[6] Heiga Zen,et al. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[8] James Glass,et al. Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[10] Bajibabu Bollepalli,et al. Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System , 2019, INTERSPEECH.
[11] Heiga Zen,et al. Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[15] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[17] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[18] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[19] Taesu Kim,et al. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[21] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[25] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[26] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[27] Erik Marchi,et al. Neural Text-to-Speech Adaptation from Low Quality Public Recordings , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[28] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[30] Frank K. Soong,et al. Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice , 2018, ArXiv.
[31] Lei Chen,et al. Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding , 2019, INTERSPEECH.
[32] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.