Phrase Boundary Assignment from Text in Multiple Domains

Detecting and modeling proper phrasing from an input text string is an important aspect when producing synthesis that sounds intelligible and natural. Knowledge of proper phrase structure influences, e.g., the placement and length of pauses, and the realization of phrase-final boundary contours, both of which can have an effect in a listener’s percepts ranging from naturalness to semantic interpretation. In this work, we look at modeling the occurrence, and types, of phrase breaks from purely textual features, paying close attention to how the performance of the systems generalizes inand out-of-domain for corpora of various types (such as broadcast news, spontaneous speech, and synthesis databases), and as a function of various subsets of syntactical and lexical features investigated.

[1]  Andrew Rosenberg,et al.  Automatic detection and classification of prosodic events , 2009 .

[2]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[3]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[4]  Dwight L. Bolinger,et al.  Intonation and Its Uses: Melody in Grammar and Discourse , 1989 .

[5]  Ani Nenkova,et al.  To Memorize or to Predict: Prominence labeling in Conversational Speech , 2007, NAACL.

[6]  Julia Hirschberg,et al.  Predicting Intonational Boundaries Automatically from Text: The ATIS Domain , 1991, HLT.

[7]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Mark Steedman,et al.  The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue , 2010, Lang. Resour. Evaluation.

[9]  Mari Ostendorf,et al.  Prediction of abstract prosodic labels for speech synthesis , 1996, Comput. Speech Lang..

[10]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[11]  Bhuvana Ramabhadran,et al.  Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data , 2010, INTERSPEECH.

[12]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[13]  Julia Hirschberg,et al.  Turn-taking and affirmative cue words in task-oriented dialogue , 2009 .

[14]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[15]  Stephen Cox,et al.  Automatic pitch accent prediction for text-to-speech synthesis , 2007, INTERSPEECH.