暂无分享,去创建一个
[1] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[2] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[3] Klaus R. Scherer,et al. Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..
[4] Haizhou Li,et al. Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data , 2020, Odyssey.
[5] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Nikos Paragios,et al. Deformable Medical Image Registration: A Survey , 2013, IEEE Transactions on Medical Imaging.
[7] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[8] Tetsuya Takiguchi,et al. GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features , 2012 .
[9] Alain Trouvé,et al. Computing Large Deformation Metric Mappings via Geodesic Flows of Diffeomorphisms , 2005, International Journal of Computer Vision.
[10] Nicolas Charon,et al. Diffeomorphic Registration of Discrete Geometric Distributions , 2018, Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore.
[11] Michael I. Miller,et al. Landmark matching via large deformation diffeomorphisms , 2000, IEEE Trans. Image Process..
[12] Yuxuan Wang,et al. Uncovering Latent Style Factors for Expressive Speech Synthesis , 2017, ArXiv.
[13] Haizhou Li,et al. Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion , 2016, INTERSPEECH.
[14] Zhiyong Wu,et al. A Review of Deep Learning Based Speech Synthesis , 2019, Applied Sciences.
[15] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[16] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[17] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[18] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[19] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] L. Younes. Shapes and Diffeomorphisms , 2010 .
[21] Ravi Shankar,et al. VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English , 2019, INTERSPEECH.
[22] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[23] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[24] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] D. Wegner,et al. Psychology (2nd Edition) , 2011 .
[26] Ravi Shankar,et al. Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks , 2019, INTERSPEECH.
[27] Ravi Shankar,et al. A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective , 2019, INTERSPEECH.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.