Supervised and unsupervised approaches for controlling narrow lexical focus in sequence-to-sequence speech synthesis
暂无分享,去创建一个
[1] Tomoki Toda,et al. A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training , 2016, INTERSPEECH.
[2] Taesu Kim,et al. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] David Konopnicki,et al. Word Emphasis Prediction for Expressive Text to Speech , 2018, INTERSPEECH.
[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[5] Heiga Zen,et al. Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Geng Yang,et al. Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Simon King,et al. Modelling prominence and emphasis improves unit-selection synthesis , 2007, INTERSPEECH.
[8] Slava Shechtman,et al. Emphatic Speech Prosody Prediction with Deep Lstm Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[11] Yuxuan Wang,et al. Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Antoine Raux,et al. A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[13] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[14] Sang Wan Lee,et al. Phonemic-level Duration Control Using Attention Alignment for Natural Speech Synthesis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[16] Lianhong Cai,et al. Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data , 2012, INTERSPEECH.
[17] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Slava Shechtman,et al. Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[19] Slava Shechtman,et al. Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU , 2020 .
[20] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[21] Kai Yu,et al. Word-level emphasis modelling in HMM-based speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[23] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[25] Michael Picheny,et al. The IBM expressive text-to-speech synthesis system for American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Bhuvana Ramabhadran,et al. Automatic exploration of corpus-specific properties for expressive text-to-speech: a case study in emphasis , 2007, SSW.