Qing He | Yang Gao | Weiyi Zheng | Zhaojun Yang | Thilo Köhler | Christian Fuegen | Thilo Köhler | Christian Fuegen | Zhaojun Yang | Weiyi Zheng | Yang Gao | Qing He
[1] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[2] K. Scherer,et al. Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.
[3] Kyomin Jung,et al. Multimodal Speech Emotion Recognition Using Audio and Text , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[4] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[5] Takao Kobayashi,et al. Modeling of various speaking styles and emotions for HMM-based speech synthesis , 2003, INTERSPEECH.
[6] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[8] K. Scherer,et al. Vocal cues in emotion encoding and decoding , 1991 .
[9] Maja J. Mataric,et al. A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[10] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[11] Srikanth Ronanki,et al. The Blizzard Challenge 2017 , 2017 .
[12] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.
[13] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[14] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Takao Kobayashi,et al. Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[16] Jiaying Liu,et al. Adaptive Batch Normalization for practical domain adaptation , 2018, Pattern Recognit..
[17] J. Yamagishi,et al. HMM-Based Speech Synthesis with Various Speaking Styles Using Model Interpolation , 2004 .
[18] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[19] Carlos Busso,et al. Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Junichi Yamagishi,et al. Principles for Learning Controllable TTS from Annotated and Latent Variation , 2017, INTERSPEECH.
[22] Srikanth Ronanki,et al. Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data , 2018, INTERSPEECH.
[23] Thierry Dutoit,et al. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis , 2019, INTERSPEECH.
[24] Yuxuan Wang,et al. Uncovering Latent Style Factors for Expressive Speech Synthesis , 2017, ArXiv.