Towards low-resource prosodic boundary detection

In this study we propose a method of prosodic boundary detection based only on acoustic cues which are easily extractable from the speech signal and without any supervision. Drawing a parallel between the process of language acquisition in babies and the speech processing techniques for under-resourced languages, we take advantage of the findings of several psycholinguistic studies relative to the cues used by babies for the identification of prosodic boundaries. Several durational and pitch cues were investigated, by themselves or in a combination, and relatively good performances were achieved. The best result obtained, a combination of all the cues, compares well against a previously proposed approach, without relying on any learning method or any lexical or syntactic cues. Index Terms Prosodic boundaries, acoustic cues, prosody recognition

[1]  Elmar Nöth,et al.  VERBMOBIL: the use of prosody in the linguistic components of a speech understanding system , 2000, IEEE Trans. Speech Audio Process..

[2]  J. Vaissière Perception of Intonation , 2008 .

[3]  James R. Glass,et al.  Speech rhythm guided syllable nuclei detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Mark Hasegawa-Johnson,et al.  ON THE EDGE: ACOUSTIC CUES TO LAYERED PROSODIC DOMAINS , 2007 .

[5]  Isabell Wartenburger,et al.  How Each Prosodic Boundary Cue Matters: Evidence from German Infants , 2012, Front. Psychology.

[6]  Bogdan Ludusan,et al.  Incorporating Prosodic Boundaries in Unsupervised Term Discovery , 2014 .

[7]  J. Fletcher The Prosody of Speech: Timing and Rhythm , 2010 .

[8]  Angelien Sanderman,et al.  On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues , 1994 .

[9]  Steve Young,et al.  The HTK book , 1995 .

[10]  J. Mehler,et al.  Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. , 1994, The Journal of the Acoustical Society of America.

[11]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Wang Bei,et al.  Acoustic Correlates of Hierarchical Prosodic Boundary in Mandarin , 2002 .

[13]  Yang Liu,et al.  Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm , 2009, ACL.

[14]  J. Mehler,et al.  Perception of Prosodic Boundary Correlates by Newborn Infants. , 2001, Infancy : the official journal of the International Society on Infant Studies.

[15]  D. Pisoni,et al.  The Handbook of Speech Perception , 2004 .

[16]  A. Seidl Infants’ use and weighting of prosodic cues in clause segmentation , 2007 .

[17]  Shrikanth S. Narayanan,et al.  Robust Speech Rate Estimation for Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Kenneth Ward Church,et al.  A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Jeung-Yoon Choi,et al.  Simultaneous recognition of words and prosody in the Boston University Radio Speech Corpus , 2005, Speech Commun..

[20]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Jui Ting Huang,et al.  Unsupervised Prosodic Break Detection in Mandarin Speech , 2008 .

[22]  Sharon Peperkamp,et al.  Discovering words in the continuous speech stream: the role of prosody , 2003, J. Phonetics.

[23]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[24]  James R. Glass,et al.  A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.

[25]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[26]  Antonio Origlia,et al.  On the Use of the Rhythmogram for Automatic Syllabic Prominence Detection , 2011, INTERSPEECH.

[27]  Shrikanth S. Narayanan,et al.  Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Chen-Yu Chiang,et al.  Unsupervised joint prosody labeling and modeling for Mandarin speech. , 2009, The Journal of the Acoustical Society of America.

[29]  Shrikanth S. Narayanan,et al.  Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling , 2006, INTERSPEECH.

[30]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.