The IBM expressive text-to-speech synthesis system for American English
暂无分享,去创建一个
Michael Picheny | Raimo Bakis | John F. Pitrelli | Ellen Eide | Raul Fernandez | Wael Hamza | R. Bakis | M. Picheny | E. Eide | J. Pitrelli | Raul Fernandez | Wael Hamza
[1] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[2] Ann K. Syrdal,et al. Inter-transcriber reliability of toBI prosodic labeling , 2000, INTERSPEECH.
[3] Alan W. Black,et al. Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[4] Hisashi Kihara,et al. digital audio signal processing , 1990 .
[5] R. Bogartz. An introduction to the analysis of variance , 1994 .
[6] C. W. Wightman. ToBI Or Not ToBI ? , 2002 .
[7] Raimo Bakis,et al. Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system , 2004, INTERSPEECH.
[8] Clifford Nass,et al. Perceptual user interfaces: perceptual bandwidth , 2000, CACM.
[9] Julia Hirschberg,et al. Automatic ToBI prediction and alignment to speed manual labeling of prosody , 2001, Speech Commun..
[10] E. Eide. Preservation, identification, and use of emotion in a text-to-speech system , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[11] Robert E. Donovan,et al. A new distance measure for costing spectral discontinuities in concatenative speech synthesizers , 2001, SSW.
[12] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.
[13] Shrikanth S. Narayanan,et al. Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.
[14] Michael Picheny,et al. Context dependent vector quantization for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[15] Julia Hirschberg,et al. Evaluation of prosodic transcription labeling reliability in the tobi framework , 1994, ICSLP.
[16] Udo Zoelzer. Digital Audio Signal Processing , 2008 .
[17] Matthew J. Makashay,et al. Corpus-based techniques in the AT&t nextgen synthesis system , 2000, INTERSPEECH.
[18] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.
[19] J. F. Pitrelli,et al. Expressive speech synthesis using American English ToBI: questions and contrastive emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[20] Justin Fackrell,et al. Segment selection in the L&h Realspeak laboratory TTS system , 2000, INTERSPEECH.
[21] Robert E. Donovan,et al. Data-driven segment preselection in the IBM trainable speech synthesis system , 2002, INTERSPEECH.
[22] Philip C. Woodland,et al. Improvements in an HMM-based speech synthesiser , 1995, EUROSPEECH.
[23] Giuseppe Riccardi,et al. Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events , 1999, EUROSPEECH.
[24] Hartmut R. Pfitzinger. Intrinsic phone durations are speaker-specific , 2002, INTERSPEECH.
[25] Raimo Bakis,et al. Multilayered extensions to the speech synthesis markup language for describing expressiveness , 2003, INTERSPEECH.