Prosodic boundary detection using syntactic and acoustic information

Abstract This paper presents a two-stage procedure for automatic prosodic boundary detection in Russian based on textual and acoustic data. The key idea of the method is (1) to predict all potential prosodic boundaries based on syntax and (2) among these potential boundaries, to choose those which are marked acoustically. For the first stage we developed a system which predicted a potential boundary whenever two adjacent words were not connected with each other in terms of syntax; for this we used a dependency tree parser and added several simple rules. At the second stage we run a random forest classifier to detect the actual prosodic boundaries using a small set of acoustic features. Of all the observed prosodic features pause duration worked best, and for some speakers it could be used as the only acoustic cue with no change in efficiency. For other speakers, however, other features were useful, such as tempo and amplitude resets or F0 range, and the choice of the features was speaker-dependent. In the end the procedure worked with the F1 measure of 0.91, recall of 0.90 and precision of 0.93, which is the best published result for Russian.

[1]  Daniil Kocharov,et al.  CORPRES - Corpus of Russian Professionally Read Speech , 2010, TSD.

[2]  Juraj Simko,et al.  Hierarchical representation and estimation of prosody using continuous wavelet transform , 2017, Comput. Speech Lang..

[3]  Klaus J. Kohler,et al.  The Transmission of Meaning by Prosodic Phrasing , 2010, Phonetica.

[4]  Jacqueline Vaissière,et al.  Language-Independent Prosodic Features , 1983 .

[5]  Tatiana Kachkovskaia,et al.  The Influence of Boundary Depth on Phrase-Final Lengthening in Russian , 2015, SLSP.

[6]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Walter Daelemans,et al.  Predicting phrase breaks with memory-based learning , 2001, SSW.

[8]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[9]  Suryakanth V. Gangashetty,et al.  An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction , 2016, INTERSPEECH.

[10]  Antonio Bonafonte,et al.  Prosodic Break Prediction with RNNs , 2016, IberSPEECH.

[11]  S. Speer,et al.  Situationally independent prosodic phrasing , 2011 .

[12]  Daniil Kocharov,et al.  Eliciting Meaningful Units from Speech , 2017, INTERSPEECH.

[13]  Sungbok Lee,et al.  How far, how long: on the temporal scope of prosodic boundary effects. , 2006, The Journal of the Acoustical Society of America.

[14]  I. Kolthoff,et al.  Titration of bases in acetonitrile , 1967 .

[15]  D. Robert Ladd,et al.  Intonational phrasing: the case for recursive prosodic structure , 1986, Phonology.

[16]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[17]  Stephen Cox,et al.  Using Part-Of-Speech Tags for Predicting Phrase Breaks , 2004 .

[18]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[19]  Yang Liu,et al.  Automatic prosodic event detection using a novel labeling and selection method in co-training , 2012, Speech Commun..

[20]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[21]  Yang Liu,et al.  Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm , 2009, ACL.

[22]  L. Streeter Acoustic determinants of phrase boundary perception. , 1978, The Journal of the Acoustical Society of America.

[23]  Shrikanth S. Narayanan,et al.  Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling , 2006, INTERSPEECH.

[24]  Stefanie Shattuck-Hufnagel,et al.  A prosody tutorial for investigators of auditory sentence processing , 1996, Journal of psycholinguistic research.

[25]  Mark Hasegawa-Johnson,et al.  ON THE EDGE: ACOUSTIC CUES TO LAYERED PROSODIC DOMAINS , 2007 .

[26]  Katarina Bartkova,et al.  PROSODIC STRUCTURE REPRESENTATION FOR BOUNDARY DETECTION IN SPONTANEOUS FRENCH , 2007 .

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Olga Khomitsevich,et al.  Using Random Forests for Prosodic Break Prediction Based on Automatic Speech Labeling , 2014, SPECOM.

[29]  Uwe D. Reichel,et al.  Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian , 2014, INTERSPEECH.

[30]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  William E. Cooper,et al.  Segmental and Temporal Aspects of Utterance-Final Lengthening , 1981 .

[32]  Joseph Tepperman,et al.  Where Should Pitch Accents and Phrase Breaks Go? A Syntax Tree Transducer Solution , 2011, INTERSPEECH.

[33]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[34]  Stephen Cox,et al.  Stochastic and syntactic techniques for predicting phrase breaks , 2007, Comput. Speech Lang..

[35]  Tatiana Kachkovskaia,et al.  Prosodic annotation in the new corpus of Russian spontaneous speech CoRuSS , 2016 .

[36]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.