An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis
暂无分享,去创建一个
[1] Xuanjing Huang,et al. Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents , 2015, EMNLP.
[2] Li-Rong Dai,et al. Multi-Layer F0 Modeling for HMM-Based Speech Synthesis , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.
[3] Antoine Raux,et al. A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[4] Takashi Nose,et al. HMM-based speech synthesis with unsupervised labeling of accentual context based on F0 quantization and average voice model , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Xuejing Sun. F0 generation for speech synthesis using a multi-tier approach , 2002, INTERSPEECH.
[6] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Xin Wang,et al. Investigating very deep highway networks for parametric speech synthesis , 2018, Speech Commun..
[8] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[9] Masami Akamine,et al. Multilevel parametric-base F0 model for speech synthesis , 2008, INTERSPEECH.
[10] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[11] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.
[12] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[13] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[14] Patricia Riddle,et al. Modelling and synthesising F0 contours with the discrete cosine transform , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Robert A. J. Clark,et al. A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Christof Traber. F0 generation with a data base of natural F0 patterns and with a neural network , 1990, SSW.
[17] G. Ayers,et al. Guidelines for ToBI labelling , 1994 .
[18] Andrew Rosenberg. Classification of Prosodic Events using Quantized Contour Modeling , 2010, HLT-NAACL.
[19] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[20] Santitham Prom-on,et al. Modeling tone and intonation in Mandarin and English as a process of target approximation. , 2009, The Journal of the Acoustical Society of America.
[21] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .
[22] Paul Taylor,et al. Using decision trees within the tilt intonation model to predict F0 contours , 1999, EUROSPEECH.
[23] Sin-Horng Chen,et al. An RNN-based prosodic information synthesizer for Mandarin text-to-speech , 1998, IEEE Trans. Speech Audio Process..
[24] C. Gussenhoven. The phonology of tone and intonation , 2004 .
[25] Mandy Eberhart,et al. Speech Communications Human And Machine , 2016 .
[26] Björn W. Schuller,et al. Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..
[27] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[28] Kai Yu,et al. Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[29] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[30] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[31] B. Moore. An introduction to the psychology of hearing, 3rd ed. , 1989 .
[32] Mark J. F. Gales,et al. Training a supra-segmental parametric F0 model without interpolating F0 , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[34] E. Owens,et al. An Introduction to the Psychology of Hearing , 1997 .
[35] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[36] S. King,et al. The Blizzard Challenge 2011 , 2011 .
[37] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.
[38] Joram Meron. Prosodic unit selection using an imitation speech database , 2001, SSW.
[39] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.
[40] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .
[41] Y. Sagisaka,et al. On the prediction of global F/sub 0/ shape for Japanese text-to-speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.