Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

[1]  S. Rochester The significance of pauses in spontaneous speech , 1973, Journal of psycholinguistic research.

[2]  R. Quirk,et al.  A Corpus of English Conversation , 1980 .

[3]  Alan Garnham,et al.  Slips of the tongue in the London-Lund corpus of spontaneous conversation , 1981 .

[4]  Anne Cutler,et al.  Prosodic marking in speech repair , 1983 .

[5]  W. Levelt,et al.  Monitoring and self-repair in speech , 1983, Cognition.

[6]  Helen M. Marcus-Roberts,et al.  Meaningless Statistics , 1987 .

[7]  Jacqueline C. Kowtko,et al.  Data Collection and Analysis in the Air Travel Planning Domain , 1989, HLT.

[8]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[9]  W. Francis,et al.  The London-Lund Corpus of Spoken English: Description and Research , 1992 .

[10]  Jared Bernstein Corpus Collection for ATIS , 1991, HLT.

[11]  D. O'Shaughnessy,et al.  Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Elizabeth Shriberg,et al.  Intonation of clause-internal filled pauses , 1992, ICSLP.

[13]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  C H Nakatani,et al.  A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[15]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[16]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[17]  J. E. Tree The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech , 1995 .

[18]  Elmar Nöth,et al.  Filled pauses in spontaneous speech , 1995 .

[19]  Andreas Stolcke,et al.  Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Andreas Stolcke,et al.  A prosody only decision-tree model for disfluency detection , 1997, EUROSPEECH.

[21]  Ralph L. Rose THE COMMUNICATIVE VALUE OF FILLED PAUSES IN SPONTANEOUS SPEECH , 1998 .

[22]  Nick Campbell Where is the information in speech? (and to what extent can it be modelled in synthesis?) , 1998, SSW.

[23]  Levent M. Arslan,et al.  Speaker transformation using sentence HMM based alignments and detailed prosody modification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Tatsuya Kawahara,et al.  Prosodic analysis of fillers and self-repair in Japanese speech , 1998, ICSLP.

[25]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[26]  Paul Taylor,et al.  Using decision trees within the tilt intonation model to predict F0 contours , 1999, EUROSPEECH.

[27]  Shu-Chuan Tseng Grammar, prosody and speech disfluencies in spoken dialogues , 1999 .

[28]  Serguei V. S. Pakhomov Modeling Filled Pauses in Medical Dictations , 1999, ACL.

[29]  Elizabeth Shriberg,et al.  Phonetic Consequences of Speech Disfluency , 1999 .

[30]  Mario Refice,et al.  Acoustic Cues for Classifying Communicative Intentions in Dialogue Systems , 2000, TSD.

[31]  Jean-Pierre Martens,et al.  Orthographic Transcription of the Spoken Dutch Corpus , 2000, LREC.

[32]  Douglas D. O'Shaughnessy,et al.  Detection of filled pauses in spontaneous conversational speech , 2000, INTERSPEECH.

[33]  J. E. Tree Listeners' uses of um and uh in speech comprehension. , 2001 .

[34]  H. H. Clark,et al.  Using uh and um in spontaneous speaking , 2002, Cognition.

[35]  Shrikanth Narayanan,et al.  Spoken language synthesis: experiments in synthesis of spontaneous monologues , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[36]  Julia Hirschberg,et al.  Communication and prosody: Functional aspects of prosody , 2002, Speech Commun..

[37]  Mark Huckvale,et al.  The reliability of the ITU-t p.85 standard for the evaluation of text-to-speech systems , 2002, INTERSPEECH.

[38]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[39]  Herbert H. Clark,et al.  Speaking in time , 2002, Speech Commun..

[40]  E. Eide Preservation, identification, and use of emotion in a text-to-speech system , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[41]  Hyunsong Chung Duration Models and the Perceptual Evaluation of Spoken Korean , 2002 .

[42]  Victoria Arranz,et al.  Lexica and corpora for speech-to-speech translation: a trilingual approach , 2003, INTERSPEECH.

[43]  Shrikanth S. Narayanan,et al.  An empirical text transformation method for spontaneous speech synthesizers , 2003, INTERSPEECH.

[44]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[45]  Michael Picheny,et al.  The IBM expressive speech synthesis system , 2004, INTERSPEECH.

[46]  Antonio Bonafonte,et al.  Intonation modeling for TTS using a joint extraction and prediction approach , 2004, SSW.

[47]  Hema A. Murthy,et al.  Duration modeling of Indian languages Hindi and Telugu , 2004, SSW.

[48]  D. O’connell,et al.  The History of Research on the Filled Pause as Evidence of The Written Language Bias in Linguistics (Linell, 1982) , 2004, Journal of psycholinguistic research.

[49]  Jordi Adell,et al.  Comparative study of automatic phone segmentation methods for TTS , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[50]  Yuan Zhao,et al.  A preliminary study of Mandarin filled pauses , 2005, DiSS.

[51]  E. Eide,et al.  Conversational computers. , 2005, Scientific American.

[52]  Christina L. Bennett Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005 , 2005, INTERSPEECH.

[53]  D. O’connell,et al.  Uh and Um Revisited: Are They Interjections for Signaling Delay? , 2005, Journal of psycholinguistic research.

[54]  Keikichi Hirose,et al.  Filled pauses as cues to the complexity of following phrases , 2005, INTERSPEECH.

[55]  Antonio Bonafonte,et al.  Spanish Synthesis Corpora , 2006, LREC.

[56]  Antonio Bonafonte,et al.  GAIA: Common Framework for the Development of Speech Translation Technologies , 2006, LREC.

[57]  Antonio Bonafonte,et al.  Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation , 2006 .

[58]  Rolf Carlson,et al.  Cues for hesitation in speech synthesis , 2006, INTERSPEECH.

[59]  Patrick Wambacq,et al.  Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation , 2006, Speech Commun..

[60]  Jordi Adell,et al.  Disfluent Speech Analysis and Synthesis: a preliminary approach. , 2006 .

[61]  B. Schmidt-nielsen,et al.  Living History , 2006 .

[62]  Nick Campbell,et al.  EVALUATION OF SPEECH SYNTHESIS From Reading Machines to Talking Machines , 2007 .

[63]  Simon King,et al.  The Blizzard Challenge 2007 , 2007 .

[64]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65]  David Escudero Mancebo,et al.  Filled Pauses in Speech Synthesis: Towards Conversational Speech , 2007, TSD.

[66]  David Escudero Mancebo,et al.  Applying data mining techniques to corpus based prosodic modeling , 2007, Speech Commun..

[67]  Simon King,et al.  Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..

[68]  Joan Claudi Socoró,et al.  Prosody Modelling of Spanish for Expressive Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[69]  Simon King,et al.  Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[70]  Jordi Adell,et al.  Corpus and Voices for Catalan Speech Synthesis , 2008, LREC.

[71]  David Escudero Mancebo,et al.  On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms , 2008, INTERSPEECH.

[72]  A. Bonafonte,et al.  Modelling Filled Pauses Prosody to Synthesise Disfluent Speech , 2009 .

[73]  Gregory W. Corder,et al.  Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach , 2009 .

[74]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[75]  David Escudero Mancebo,et al.  Synthesis of filled pauses based on a disfluent speech model , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[76]  Ibon Saratxaga,et al.  Emotion Conversion Based on Prosodic Unit Selection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[77]  Eun-Ju Lee,et al.  The more humanlike, the better? How speech type and users' cognitive style affect social responses to computers , 2010, Comput. Hum. Behav..

[78]  Kallirroi Georgila,et al.  Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection , 2010 .

[79]  Gregory W. Corder,et al.  Nonparametric Statistics : A Step-by-Step Approach , 2014 .