Incorporating Prosodic Boundaries in Unsupervised Term Discovery

We present a preliminary investigation on the usefulness of prosodic boundaries for unsupervised term discovery (UTD). Studies in language acquisition show that infants use prosodic boundaries to segment continuous speech into word-like units. We evaluate whether such a strategy could also help UTD algo- rithms. Running a previously published UTD algorithm (MODIS) on a corpus of prosodically annotated English broadcast news revealed that many discovered terms straddle prosodic boundaries. We then implemented two variants of this algorithm: one that discards straddling items and one that truncates them to the nearest boundary (either prosodic or pause marker). Both algorithms showed a better term matching F-score compared to the baseline and higher level prosodic boundaries were found to be better than lower level boundaries or pause markers. In addition, we observed that the truncation algorithm, but not the discard algorithm, increased word boundary F-score over the baseline.

[1]  James R. Glass,et al.  Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input , 2007, ACL.

[2]  Frédéric Bimbot,et al.  Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Goldsmith,et al.  Statistical Learning by 8-Month-Old Infants , 1996 .

[4]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  J. Mehler,et al.  Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. , 1994, The Journal of the Acoustical Society of America.

[6]  J. Mehler,et al.  Phonological phrase boundaries constrain lexical access II. Infant data , 2004 .

[7]  J. Mehler,et al.  Perception of Prosodic Boundary Correlates by Newborn Infants. , 2001, Infancy : the official journal of the International Society on Infant Studies.

[8]  Nuria Oliver,et al.  Spoken WordCloud: Clustering recurrent patterns in speech , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[9]  Aren Jansen,et al.  NLP on Spoken Documents Without ASR , 2010, EMNLP.

[10]  Irene Vogel,et al.  Prosodic Structure Above the Word , 1983 .

[11]  Kenneth Ward Church,et al.  Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.

[12]  Robert Daland,et al.  Learning Diphone-Based Segmentation , 2011, Cogn. Sci..

[13]  Frédéric Bimbot,et al.  Zero-Resource Audio-Only Spoken Term Detection Based on a Combination of Template Matching Techniques , 2011, INTERSPEECH.

[14]  Frédéric Bimbot,et al.  MODIS: an audio motif discovery software , 2013, INTERSPEECH.

[15]  Frédéric Bimbot,et al.  Variability Tolerant Audio Motif Discovery , 2009, MMM.

[16]  P. Jusczyk,et al.  When prosody fails to cue syntactic structure: 9-month-olds' sensitivity to phonological versus syntactic phrases , 1994, Cognition.

[17]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[18]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[19]  A. Woodward,et al.  Perception of acoustic correlates of major phrasal units by young infants , 1992, Cognitive Psychology.

[20]  A. Christophea,et al.  Phonological phrase boundaries constrain lexical access I . Adult data q , 2003 .