Learning continuous representation of text for phone duration modeling in statistical parametric speech synthesis
暂无分享,去创建一个
Suryakanth V. Gangashetty | Sai Krishna Rallabandi | Sai Sirisha Rallabandi | Padmini Bandi | S. Gangashetty | S. Rallabandi | Padmini Bandi
[1] Susan T. Dumais,et al. Learned Vector-Space Models for Document Retrieval , 1995, Inf. Process. Manag..
[2] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.
[3] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.
[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[5] Katie McGrath,et al. Language Identification and Language Specific Letter-to-Sound Rules , 2004 .
[6] Simon King,et al. Bayesian networks for phone duration prediction , 2008, Speech Commun..
[7] Bayya Yegnanarayana,et al. Modeling syllable duration in Indian languages using neural networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yiyu Yao,et al. An analysis of vector space models based on computational geometry , 1992, SIGIR '92.
[10] Jan P. H. van Santen,et al. Contextual effects on vowel duration , 1992, Speech Commun..
[11] Takao Kobayashi,et al. Phone duration modeling using gradient tree boosting , 2008, Speech Commun..
[12] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[13] Alan W. Black,et al. CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.
[14] Simon King,et al. Multidimensional scaling of listener responses to synthetic speech , 2005, INTERSPEECH.
[15] Mohsen Rashwan,et al. Duration modeling for arabic text to speech synthesis , 2002, INTERSPEECH.
[16] Rodney W. Johnson,et al. Automatic translation of english text to phonetics by means of letter-to-sound rules (nrl report 794 , 1976 .
[17] J. Freidman,et al. Multivariate adaptive regression splines , 1991 .
[18] Oliver Watts,et al. Unsupervised learning for text-to-speech synthesis , 2013 .
[19] Ganesh Ramakrishnan,et al. MILE TTS for Tamil and Kannada for blizzard challenge 2013 , 2013 .
[20] Alan W Black,et al. Festvox : Tools for Creation and Analyses of Large Speech Corpora , 2010 .
[21] Marcel Riedi,et al. Modeling segmental duration with multivariate adaptive regression splines , 1997, EUROSPEECH.
[22] Alan W. Black,et al. Letter to sound rules for accented lexicon compression , 1998, ICSLP.
[23] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[24] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .
[25] Dennis H. Klatt,et al. Perception of Segment Duration in Sentence Contexts , 1975 .
[26] Heiga Zen,et al. Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.
[27] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.
[28] Keiichi Tokuda,et al. The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets , 2005, INTERSPEECH.
[29] Nikos Fakotakis,et al. Improving phone duration modelling using support vector regression fusion , 2011, Speech Commun..
[30] Kishore Prahallad,et al. Automatic Building of Synthetic Voices from Audio Books , 2010 .
[31] Xiaochuan Niu,et al. Prediction and synthesis of prosodic effects on spectral balance of vowels , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[32] John D. Lafferty,et al. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.