Combining models of prosodic phrasing and pausing

This paper describes two approaches to assigning prosodic phrase structure and pauses to text and investigates the impact of errors in the assignments for different granularities of prosodic phrase structure. One approach uses a cascaded combination of models trained separately for prediction of prosodic phrase structure and pauses and the other uses a model trained for the joint prediction task directly. Objective measurements show similar performance for both approaches while perceptual evaluations show a slight preference for an optimised cascaded combination of prosodic phrase structure and pause models using a single-level encoding of prosodic phrase structure.

[1]  Sabine Buchholz,et al.  Influence of syntax on prosodic boundary prediction , 2005, INTERSPEECH.

[2]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .

[5]  Walter Daelemans,et al.  Predicting phrase breaks with memory-based learning , 2001, SSW.

[6]  Philipp Koehn,et al.  Improving intonational phrasing with syntactic information , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).