GPR-based Thai speech synthesis using multi-level duration prediction
暂无分享,去创建一个
[1] Jan P. H. van Santen,et al. Contextual effects on vowel duration , 1992, Speech Commun..
[2] W. Nick Campbell. Predicting segmental durations for accommodation within a syllable-level timing framework , 1993, EUROSPEECH.
[3] Zhizheng Wu,et al. Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Takao Kobayashi,et al. Phone duration modeling using gradient tree boosting , 2008, Speech Commun..
[5] Sin-Horng Chen,et al. A new duration modeling approach for Mandarin speech , 2003, IEEE Trans. Speech Audio Process..
[6] Srikanth Ronanki,et al. Robust TTS duration modelling using DNNS , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Takao Kobayashi,et al. A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data , 2015, INTERSPEECH.
[8] Takashi Nose,et al. Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Takao Kobayashi,et al. Design of tree-based context clustering for an HMM-based Thai speech synthesis system , 2007, SSW.
[10] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[12] Inma Hernáez,et al. A Hybrid TTS Approach for Prosody and Acoustic Modules , 2011, INTERSPEECH.
[13] Takashi Nose,et al. Statistical nonparametric speech synthesis using sparse Gaussian processes , 2013, INTERSPEECH.
[14] Simon King,et al. Bayesian networks for phone duration prediction , 2008, Speech Commun..
[15] Sadaoki Furui,et al. Thai speech processing technology: A review , 2007, Speech Commun..
[16] Stephen Isard,et al. Segment durations in a syllable frame , 1991 .
[17] Mary P. Harper,et al. Vowel length and stress in Thai , 1998 .
[18] Takashi Nose,et al. Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Zhizheng Wu,et al. Duration refinement by jointly optimizing state and longer unit likelihood , 2008, INTERSPEECH.
[20] Takao Kobayashi,et al. Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis , 2008, Speech Commun..
[21] Géza Németh,et al. DNN-Based Duration Modeling for Synthesizing Short Sentences , 2016, SPECOM.
[22] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Takao Kobayashi,et al. Duration prediction using multi-level model for GPR-based speech synthesis , 2015, INTERSPEECH.
[24] Nikos Fakotakis,et al. Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis , 2012, Comput. Speech Lang..
[25] Takao Kobayashi,et al. Implementation and evaluation of an HMM-based Thai speech synthesis system , 2007, INTERSPEECH.
[26] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Heiga Zen,et al. Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .
[28] Yoshinori Sagisaka,et al. Statistical modelling of speech segment duration by constrained tree regression , 2000 .
[29] Koichi Shinoda,et al. MDL-based context-dependent subword modeling for speech recognition , 2000 .
[30] M P Harper,et al. Acoustic Correlates of Stress in Thai , 1996, Phonetica.
[31] Bayya Yegnanarayana,et al. Modeling durations of syllables using neural networks , 2007, Comput. Speech Lang..
[32] Diamantino Freitas,et al. Segmental durations predicted with a neural network , 2003, INTERSPEECH.
[33] Nikos Fakotakis,et al. Improving phone duration modelling using support vector regression fusion , 2011, Speech Commun..
[34] Takashi Nose,et al. Statistical Parametric Speech Synthesis Based on Gaussian Process Regression , 2014, IEEE Journal of Selected Topics in Signal Processing.
[35] Yang Wang,et al. Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis , 2015, INTERSPEECH.
[36] Keiichi Tokuda,et al. Duration modeling for HMM-based speech synthesis , 1998, ICSLP.
[37] Philip N. Garner,et al. SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis , 2014 .
[38] Takao Kobayashi,et al. Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Sudaporn Luksaneeyanawin,et al. Intonation in Thai. , 1983 .