论文信息 - Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence - 字舞流文

Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

David Escudero Mancebo | Jordi Adell | Antonio Bonafonte | A. Bonafonte | J. Adell

[1] S. Rochester. The significance of pauses in spontaneous speech , 1973, Journal of psycholinguistic research.

[2] R. Quirk,et al. A Corpus of English Conversation , 1980 .

[3] Alan Garnham,et al. Slips of the tongue in the London-Lund corpus of spontaneous conversation , 1981 .

[4] Anne Cutler,et al. Prosodic marking in speech repair , 1983 .

[5] W. Levelt,et al. Monitoring and self-repair in speech , 1983, Cognition.

[6] Helen M. Marcus-Roberts,et al. Meaningless Statistics , 1987 .

[7] Jacqueline C. Kowtko,et al. Data Collection and Analysis in the Air Travel Planning Domain , 1989, HLT.

[8] W. Levelt,et al. Speaking: From Intention to Articulation , 1990 .

[9] W. Francis,et al. The London-Lund Corpus of Spoken English: Description and Research , 1992 .

[10] Jared Bernstein. Corpus Collection for ATIS , 1991, HLT.

[11] D. O'Shaughnessy,et al. Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Elizabeth Shriberg,et al. Intonation of clause-internal filled pauses , 1992, ICSLP.

[13] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] C H Nakatani,et al. A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[15] Alexander I. Rudnicky,et al. Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[16] Elisabeth Schriberg,et al. Preliminaries to a Theory of Speech Disfluencies , 1994 .

[17] J. E. Tree. The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech , 1995 .

[18] Elmar Nöth,et al. Filled pauses in spontaneous speech , 1995 .

[19] Andreas Stolcke,et al. Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20] Andreas Stolcke,et al. A prosody only decision-tree model for disfluency detection , 1997, EUROSPEECH.

[21] Ralph L. Rose. THE COMMUNICATIVE VALUE OF FILLED PAUSES IN SPONTANEOUS SPEECH , 1998 .

[22] Nick Campbell. Where is the information in speech? (and to what extent can it be modelled in synthesis?) , 1998, SSW.

[23] Levent M. Arslan,et al. Speaker transformation using sentence HMM based alignments and detailed prosody modification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24] Tatsuya Kawahara,et al. Prosodic analysis of fillers and self-repair in Japanese speech , 1998, ICSLP.

[25] Paul Taylor,et al. The tilt intonation model , 1998, ICSLP.

[26] Paul Taylor,et al. Using decision trees within the tilt intonation model to predict F0 contours , 1999, EUROSPEECH.

[27] Shu-Chuan Tseng. Grammar, prosody and speech disfluencies in spoken dialogues , 1999 .

[28] Serguei V. S. Pakhomov. Modeling Filled Pauses in Medical Dictations , 1999, ACL.

[29] Elizabeth Shriberg,et al. Phonetic Consequences of Speech Disfluency , 1999 .

[30] Mario Refice,et al. Acoustic Cues for Classifying Communicative Intentions in Dialogue Systems , 2000, TSD.

[31] Jean-Pierre Martens,et al. Orthographic Transcription of the Spoken Dutch Corpus , 2000, LREC.

[32] Douglas D. O'Shaughnessy,et al. Detection of filled pauses in spontaneous conversational speech , 2000, INTERSPEECH.

[33] J. E. Tree. Listeners' uses of um and uh in speech comprehension. , 2001 .

[34] H. H. Clark,et al. Using uh and um in spontaneous speaking , 2002, Cognition.

[35] Shrikanth Narayanan,et al. Spoken language synthesis: experiments in synthesis of spontaneous monologues , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[36] Julia Hirschberg,et al. Communication and prosody: Functional aspects of prosody , 2002, Speech Commun..

[37] Mark Huckvale,et al. The reliability of the ITU-t p.85 standard for the evaluation of text-to-speech systems , 2002, INTERSPEECH.

[38] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[39] Herbert H. Clark,et al. Speaking in time , 2002, Speech Commun..

[40] E. Eide. Preservation, identification, and use of emotion in a text-to-speech system , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[41] Hyunsong Chung. Duration Models and the Perceptual Evaluation of Spoken Korean , 2002 .

[42] Victoria Arranz,et al. Lexica and corpora for speech-to-speech translation: a trilingual approach , 2003, INTERSPEECH.

[43] Shrikanth S. Narayanan,et al. An empirical text transformation method for spontaneous speech synthesizers , 2003, INTERSPEECH.

[44] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .

[45] Michael Picheny,et al. The IBM expressive speech synthesis system , 2004, INTERSPEECH.

[46] Antonio Bonafonte,et al. Intonation modeling for TTS using a joint extraction and prediction approach , 2004, SSW.

[47] Hema A. Murthy,et al. Duration modeling of Indian languages Hindi and Telugu , 2004, SSW.

[48] D. O’connell,et al. The History of Research on the Filled Pause as Evidence of The Written Language Bias in Linguistics (Linell, 1982) , 2004, Journal of psycholinguistic research.

[49] Jordi Adell,et al. Comparative study of automatic phone segmentation methods for TTS , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[50] Yuan Zhao,et al. A preliminary study of Mandarin filled pauses , 2005, DiSS.

[51] E. Eide,et al. Conversational computers. , 2005, Scientific American.

[52] Christina L. Bennett. Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005 , 2005, INTERSPEECH.

[53] D. O’connell,et al. Uh and Um Revisited: Are They Interjections for Signaling Delay? , 2005, Journal of psycholinguistic research.

[54] Keikichi Hirose,et al. Filled pauses as cues to the complexity of following phrases , 2005, INTERSPEECH.

[55] Antonio Bonafonte,et al. Spanish Synthesis Corpora , 2006, LREC.

[56] Antonio Bonafonte,et al. GAIA: Common Framework for the Development of Speech Translation Technologies , 2006, LREC.

[57] Antonio Bonafonte,et al. Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation , 2006 .

[58] Rolf Carlson,et al. Cues for hesitation in speech synthesis , 2006, INTERSPEECH.

[59] Patrick Wambacq,et al. Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation , 2006, Speech Commun..

[60] Jordi Adell,et al. Disfluent Speech Analysis and Synthesis: a preliminary approach. , 2006 .

[61] B. Schmidt-nielsen,et al. Living History , 2006 .

[62] Nick Campbell,et al. EVALUATION OF SPEECH SYNTHESIS From Reading Machines to Talking Machines , 2007 .

[63] Simon King,et al. The Blizzard Challenge 2007 , 2007 .

[64] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65] David Escudero Mancebo,et al. Filled Pauses in Speech Synthesis: Towards Conversational Speech , 2007, TSD.

[66] David Escudero Mancebo,et al. Applying data mining techniques to corpus based prosodic modeling , 2007, Speech Commun..

[67] Simon King,et al. Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..

[68] Joan Claudi Socoró,et al. Prosody Modelling of Spanish for Expressive Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[69] Simon King,et al. Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[70] Jordi Adell,et al. Corpus and Voices for Catalan Speech Synthesis , 2008, LREC.

[71] David Escudero Mancebo,et al. On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms , 2008, INTERSPEECH.

[72] A. Bonafonte,et al. Modelling Filled Pauses Prosody to Synthesise Disfluent Speech , 2009 .

[73] Gregory W. Corder,et al. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach , 2009 .

[74] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .

[75] David Escudero Mancebo,et al. Synthesis of filled pauses based on a disfluent speech model , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[76] Ibon Saratxaga,et al. Emotion Conversion Based on Prosodic Unit Selection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[77] Eun-Ju Lee,et al. The more humanlike, the better? How speech type and users' cognitive style affect social responses to computers , 2010, Comput. Hum. Behav..

[78] Kallirroi Georgila,et al. Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection , 2010 .

[79] Gregory W. Corder,et al. Nonparametric Statistics : A Step-by-Step Approach , 2014 .