暂无分享,去创建一个
[1] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[2] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[4] Carlos Busso,et al. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings , 2019, IEEE Transactions on Affective Computing.
[5] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[6] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[8] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[9] Soroosh Mariooryad,et al. Semi-Supervised Generative Modeling for Controllable Speech Synthesis , 2019, ICLR.
[10] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[11] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[12] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[13] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[14] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[15] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[16] Marc Schröder,et al. Emotional speech synthesis: a review , 2001, INTERSPEECH.
[17] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[18] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[19] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[20] Xu Tan,et al. Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion , 2019, INTERSPEECH.
[21] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[22] J. Russell. A circumplex model of affect. , 1980 .
[23] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[24] Thomas Drugman,et al. CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech , 2020, INTERSPEECH.
[25] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[26] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[28] Frank K. Soong,et al. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.