Syllable weight encodes mostly the same information for English word segmentation as dictionary stress

Stress is a useful cue for English word segmentation. A wide range of computational models have found that stress cues enable a 2-10% improvement in segmentation accuracy, depending on the kind of model, by using input that has been annotated with stress using a pronouncing dictionary. However, stress is neither invariably produced nor unambiguously identifiable in real speech. Heavy syllables, i.e. those with long vowels or syllable codas, attract stress in English. We devise Adaptor Grammar word segmentation models that exploit either stress, or syllable weight, or both, and evaluate the utility of syllable weight as a cue to word boundaries. Our results suggest that syllable weight encodes largely the same information for word segmentation in English that annotated dictionary stress does.

[1]  Anne Cutler,et al.  The predominance of strong initial syllables in the English vocabulary , 1987 .

[2]  A. Prince,et al.  On stress and linguistic rhythm , 1977 .

[3]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[4]  C. Anton Rytting,et al.  Segmenting words from natural speech: subsegmental variation in segmental cues. , 2010, Journal of child language.

[5]  Constantine Lignos,et al.  Modeling Infant Word Segmentation , 2011, CoNLL.

[6]  Roger Levy,et al.  Combining multiple information types in Bayesian word segmentation , 2013, NAACL.

[7]  Morten H. Christiansen,et al.  Learning to Segment Speech Using Multiple Cues: A Connectionist Model , 1998 .

[8]  Erik D. Thiessen,et al.  When cues collide: use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. , 2003, Developmental psychology.

[9]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[10]  D. Fry Duration and Intensity as Physical Correlates of Linguistic Stress , 1954 .

[11]  P. Jusczyk,et al.  Do English-Learning Infants use Syllable Weight to Determine Stress? , 1995, Language and speech.

[12]  Jennifer Culbertson,et al.  Word-minimality, Epenthesis and Coda Licensing in the Early Acquisition of English , 2006, Language and speech.

[13]  Charles Yang,et al.  Recession Segmentation: Simpler Online Word Segmentation Using Limited Resources , 2010, CoNLL.

[14]  Morten H. Christiansen,et al.  The power of statistical learning: No need for algebraic rules , 2020, Proceedings of the Twenty First Annual Conference of the Cognitive Science Society.

[15]  D. Fry Experiments in the Perception of Stress , 1958 .

[16]  Constantine Lignos,et al.  Infant Word Segmentation: AnIncremental, Integrated Model , 2012 .

[17]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[18]  P. Jusczyk,et al.  Infants' preference for the predominant stress patterns of English words. , 1993, Child development.

[19]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[20]  Charles D. Yang Universal Grammar, statistics or both? , 2004, Trends in Cognitive Sciences.

[21]  Mark Johnson,et al.  Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[22]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[23]  Mary R. Newsome,et al.  The Beginnings of Word Segmentation in English-Learning Infants , 1999, Cognitive Psychology.