暂无分享,去创建一个
Tan Lee | Ying Qin | Daxin Tan | Guangyan Zhang | Tan Lee | Daxin Tan | Ying Qin | Guangyan Zhang
[1] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[4] Taesu Kim,et al. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.
[6] Alexander A. Alemi,et al. Fixing a Broken ELBO , 2017, ICML.
[7] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[8] Tan Lee,et al. Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation , 2020, INTERSPEECH.
[9] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.
[10] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[11] Guillaume Desjardins,et al. Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.
[12] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[13] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[14] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[15] Heiga Zen,et al. Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[17] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[18] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[19] Duane G. Watson,et al. Experimental and theoretical advances in prosody: A review , 2010, Language and cognitive processes.
[20] Thomas Drugman,et al. CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech , 2020, INTERSPEECH.
[21] Gal Chechik,et al. Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..
[22] James Glass,et al. Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Oliver Watts,et al. Using generative modelling to produce varied intonation for speech synthesis , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.
[26] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[27] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Tan Lee,et al. Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis , 2021, 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[30] Simon King,et al. A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $F_0$ Model for Statistical Parametric Speech Synthesis , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[31] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[32] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[33] Tan Lee,et al. Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement , 2020, Interspeech.
[34] Xiaodong Liu,et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.
[35] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[36] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).