Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian

This paper presents a two-step method of automatic prosodic boundary detection using both textual and acoustic features. Firstly, we predict possible boundary positions using textual features; secondly, we detect the actual boundaries at the predicted positions using acoustic features. For evaluation of the algorithms we use a 26-h subcorpus of CORPRES, a prosodically annotated corpus of Russian read speech. We have also conducted two independent experiments using acoustic features and textual features separately. Acoustic features alone enable to achieve the F\(_1\) measure of 0.85, precision of 0.94, recall of 0.78. Textual features alone work with the F\(_1\) measure of 0.84, precision of 0.84, recall of 0.83. The proposed two-step approach combining the two groups of features yields the efficiency of 0.90, recall of 0.85 and precision of 0.99. It preserves the high recall provided by textual information and the high precision achieved using acoustic information. This is the best published result for Russian.

[1]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[2]  Stephen Cox,et al.  Using Part-Of-Speech Tags for Predicting Phrase Breaks , 2004 .

[3]  Sarah Hoffmann,et al.  A Data-driven Model for the Generation of Prosody from Syntactic Sentence Structures , 2014 .

[4]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Walter Daelemans,et al.  Predicting phrase breaks with memory-based learning , 2001, SSW.

[6]  Yang Liu,et al.  Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm , 2009, ACL.

[7]  Jacqueline Vaissière,et al.  Language-Independent Prosodic Features , 1983 .

[8]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[9]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[10]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[11]  Daniil Kocharov,et al.  CORPRES - Corpus of Russian Professionally Read Speech , 2010, TSD.

[12]  Tatiana Kachkovskaia,et al.  The Influence of Boundary Depth on Phrase-Final Lengthening in Russian , 2015, SLSP.

[13]  W. Chafe Punctuation and the Prosody of Written Language , 1988 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Stephen Cox,et al.  Stochastic and syntactic techniques for predicting phrase breaks , 2007, Comput. Speech Lang..

[16]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[17]  L. Streeter Acoustic determinants of phrase boundary perception. , 1978, The Journal of the Acoustical Society of America.

[18]  Mark Hasegawa-Johnson,et al.  ON THE EDGE: ACOUSTIC CUES TO LAYERED PROSODIC DOMAINS , 2007 .

[19]  Katarina Bartkova,et al.  PROSODIC STRUCTURE REPRESENTATION FOR BOUNDARY DETECTION IN SPONTANEOUS FRENCH , 2007 .

[20]  Olga Khomitsevich,et al.  Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling , 2014, SPECOM.

[21]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Joseph Tepperman,et al.  Where Should Pitch Accents and Phrase Breaks Go? A Syntax Tree Transducer Solution , 2011, INTERSPEECH.