Training intonational phrasing rules automatically for English and Spanish text-to-speech

Abstract We describe a procedure for acquiring intonational phrasing rules for text-to-speech synthesis automatically, from annotated text, and some evaluation of this procedure for English and Spanish. The procedure employs decision trees generated automatically, using Classification and Regression Tree techniques, from text corpora which have been hand-labeled by native speakers with likely locations of intonational boundaries, in conjunction with information available about the text via simple text analysis techniques. Rules generated by this method have been implemented in the English version of the Bell Laboratories Text-to-Speech System and have been developed for the Mexican Spanish version of that system. These rules currently achieve better than 95% accuracy for English and better than 94% for Spanish.

[1]  John Bear,et al.  The Use of Relative Duration in Syntactic Disambiguation , 1990, HLT.

[2]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[3]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[4]  Miguel Ángel Rodríguez Crespo,et al.  AMIGO: Un conversor texto-voz para español , 1993 .

[5]  Kim E. A. Silverman,et al.  Synthesiser intelligibility in the context of a name-and-address information service , 1993, EUROSPEECH.

[6]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[7]  Françoise Emerard,et al.  Synthesis of Spoken Messages from Semantic Representations. Semantic-Representation-to-Speech System , 1986, COLING.

[8]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[9]  B. Altenberg Prosodic patterns in spoken English : studies in the correlation between prosody and grammar for text-to-speech conversion , 1990 .

[10]  Douglas D. OShaughnessy,et al.  Parsing with a Small Dictionary for Applications such as Text to Speech , 1989, Comput. Linguistics.

[11]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[12]  Julia Hirschberg,et al.  Evaluation of prosodic transcription labeling reliability in the tobi framework , 1994, ICSLP.

[13]  Julia Hirschberg Using text analysis to predict intonational boundaries , 1991, EUROSPEECH.

[14]  Julia Hirschberg,et al.  Predicting Intonational Phrasing from Text , 1991, ACL.

[15]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .

[16]  Dwight L. Bolinger,et al.  Intonation and Its Uses: Melody in Grammar and Discourse , 1989 .

[17]  F. Fallside,et al.  Speech synthesis from concept: A method for speech output from information systems , 1979 .

[18]  Julia Hirschberg,et al.  Predicting Intonational Boundaries Automatically from Text: The ATIS Domain , 1991, HLT.

[19]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[20]  Françoise Emerard,et al.  Linguistic and prosodic processing for a text-to-speech synthesis system , 1989, EUROSPEECH.

[21]  Betina Schnabel,et al.  Automatic linguistic processing in a German text-to-speech synthesis system , 1990, Speech Synthesis Workshop.

[22]  Hugo Quené,et al.  The derivation of prosody for text-to-speech from prosodic sentence structure☆ , 1992 .

[23]  Alexander Ian Campbell Monaghan,et al.  Intonation in a text-to-speech conversion system , 1991 .

[24]  René Collier,et al.  Intonation and Its Uses. Melody in Grammar and Discourse , 1990 .