Pauses and respiratory markers of the structure of book reading

The automatic reading of books by text-to-speech synthesizers requires not only the adequate encoding of the many levels of information and discourse structures in the acoustic signals but also the proper patterns of breathing, so that to pace information and organize discourse at an ecological rhythm. We analyze here the locations and durations of near 4,000 pauses produced by voice donor who has read several audiobooks, freely available on the web. Since the voice was recorded by a close microphone, we also characterized the acoustic markers of inhalation and show that the delay between end of phonation and air intake can be considered as an additional marker of thematic continuity between the two adjacent speech chunks that complements well-documented prosodic cues such as the preboundary tone and lengthening or the pause duration.

[1]  Fred Cummins,et al.  Pause duration and variability in read texts , 2002, INTERSPEECH.

[2]  Sathish Pammi,et al.  Pause Prediction from Lexical and Syntax Information , 2006 .

[3]  P. Schönle,et al.  Speech and respiration , 1979, Archiv für Psychiatrie und Nervenkrankheiten.

[4]  Peter Jackson,et al.  Combining models of prosodic phrasing and pausing , 2005, INTERSPEECH.

[5]  Alan W. Black,et al.  A Grammar Based Approach to Style Specific Phrase Prediction , 2011, INTERSPEECH.

[6]  Raymond D. Kent,et al.  Accuracy of perceptually based and acoustically based inspiratory loci in reading , 2010, Behavior research methods.

[7]  Gérard Bailly,et al.  Characterisation of rhythmic patterns for text-to-speech synthesis , 1994, Speech Communication.

[8]  D. McFarland Respiratory markers of conversational interaction. , 2001, Journal of speech, language, and hearing research : JSLHR.

[9]  A. Winkworth,et al.  Variability and consistency in speech breathing during reading: lung volumes, speech intensity, and linguistic factors. , 1994, Journal of speech and hearing research.

[10]  Uwe D. Reichel,et al.  Text-based and Signal-based Prediction of Break Indices and Pause Durations , 2006 .

[11]  D H Whalen,et al.  Exploring the Relationship of Inspiration Duration to Utterance Duration , 1997, Phonetica.

[12]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[13]  Estelle Campione,et al.  A large-scale multilingual study of silent pause duration , 2002, Speech Prosody 2002.

[14]  Kishore Prahallad,et al.  Handling large audio files in audio books for building synthetic voices , 2010, SSW.

[15]  A Smith,et al.  Respiratory control in stuttering speakers: evidence from respiratory high-frequency oscillations. , 2000, Journal of speech, language, and hearing research : JSLHR.

[16]  Dirk Heylen,et al.  Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Gerrit Kentner,et al.  Length, ordering preference and intonational phrasing: evidence from pauses , 2007, INTERSPEECH.

[18]  D H Whalen,et al.  The effects of breath sounds on the perception of synthetic speech. , 1995, The Journal of the Acoustical Society of America.

[19]  Anne-Catherine Simon,et al.  Étude statistique de la durée pausale dans différents styles de parole , 2010 .

[20]  Jian Yu,et al.  The pause duration prediction for Mandarin text-to-speech system , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[21]  Arlette Kolta,et al.  Brainstem circuits that control mastication: do they have anything to say during speech? , 2006, Journal of communication disorders.

[22]  Jens Apel,et al.  Have a break ! Modelling pauses in German Speech , 2004 .

[23]  Edward Gibson,et al.  Intonational phrasing is constrained by meaning, not balance , 2011 .

[24]  Rafael Marín,et al.  Placing pauses in read spoken Spanish: a model and an algorithm , 2002 .

[25]  Fred Cummins,et al.  The effect of surrounding phrase lengths on pause duration , 2003, INTERSPEECH.

[26]  Kishore Prahallad,et al.  Learning speaker-specific phrase breaks for text-to-speech systems , 2010, SSW.