Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen)

Many computational models have been developed to show how infants break apart utterances into words prior to building a vocabulary—the “word segmentation task.” Most models assume that infants, upon hearing an utterance, represent this input as a string of segments. One type of model uses statistical cues calculated from the distribution of segments within the child-directed speech to locate those points most likely to contain word boundaries. However, these models have been tested in relatively few languages, with little attention paid to how different phonological structures may affect the relative effectiveness of particular statistical heuristics. This dissertation addresses this issue by comparing the performance of two classes of distribution-based statistical cues on a corpus of Modern Greek, a language with a phonotactic structure significantly different from that of English, and shows how these differences change the relative effectiveness of these cues. Another fundamental issue critically examined in this dissertation is the practice of representing input as a string of segments. Such a representation implicitly assumes complete certainty as to the phonemic identity of each segment. This runs counter both to standard practice in automatic speech recognition (where “hard decisions” are eschewed) and, more crucially, overestimates the ability of infants to parse and identify those segments from the spoken input. If even adult native speakers (with the benefit of higher-level linguistic knowledge, such as a

[1]  F. D. Saussure,et al.  Cours de linguistique générale@@@Cours de linguistique generale , 1972 .

[2]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .

[3]  Bhuvana Ramabhadran,et al.  Improvements in English ASR for the MALACH project using syllable-centric models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[4]  Jeff Mielke,et al.  The Emergence of Distinctive Features , 2008 .

[5]  A S House,et al.  Phonological oppositions in children: a perceptual study. , 1971, The Journal of the Acoustical Society of America.

[6]  P. Eimas Segmental and syllabic representations in the perception of speech by young infants. , 1999, The Journal of the Acoustical Society of America.

[7]  P. Jusczyk,et al.  The cocktail party effect in infants. , 1995, Perception & psychophysics.

[8]  Xiaofei Lu,et al.  Hybrid models for Chinese unknown word resolution , 2006 .

[9]  C. Anton Rytting Segment Predictability as a Cue in Word Segmentation: Application to Modern Greek , 2004, SIGMORPHON@ACL.

[10]  Katherine S. White,et al.  A Statistical Basis for Speech Sound Discrimination , 2003, Language and speech.

[11]  P. Jusczyk,et al.  Do infants segment words or recurring contiguous patterns? , 2001, Journal of experimental psychology. Human perception and performance.

[12]  P. D. Eimas,et al.  chapter 6 – Speech Perception in Early Infancy1 , 1975 .

[13]  Eric Fosler-Lussier,et al.  A Cost-Benefit Analysis of Hybrid Phone-Manner Representations for ASR , 2005, HLT/EMNLP.

[14]  Morten H. Christiansen,et al.  Subjacency Constraints without Universal Grammar: Evidence from Artificial Language Learning and Con , 2000 .

[15]  R. D. Glave,et al.  Is the effort dependence of speech loudness explicable on the basis of acoustical cues? , 1975, The Journal of the Acoustical Society of America.

[16]  J. Ohala Papers in Laboratory Phonology: The phonetics and phonology of aspects of assimilation , 1990 .

[17]  M. Beckman Input Representations ( Inside the Mind and Out ) , 2003 .

[18]  Rochelle S Newman,et al.  The cocktail party effect in infants revisited: listening to one's name in noise. , 2005, Developmental psychology.

[19]  Antonis Botinis,et al.  Acoustic Characteristics of Greek Vowels , 1999, Phonetica.

[20]  Morten H. Christiansen,et al.  Multiple-Cue Integration in Language Acquisition : A Connectionist Model of Speech Segmentation and Rule-like Behavior , 2004 .

[21]  Eleanor Olds Batchelder,et al.  Bootstrapping the lexicon: A computational model of infant speech segmentation , 2002, Cognition.

[22]  J. Werker,et al.  Tuned to the signal: the privileged status of speech for young infants. , 2004, Developmental science.

[23]  Erik D. Thiessen,et al.  Infant-Directed Speech Facilitates Word Segmentation. , 2005, Infancy : the official journal of the International Society on Infant Studies.

[24]  C. Anton Rytting An iota of difference: Attitudes to , 2005 .

[25]  Catharine H. Echols,et al.  The perception of rhythmic units in speech by infants and adults. , 1997 .

[26]  J. Harrington,et al.  Monophthongal vowel changes in Received Pronunciation: an acoustic analysis of the Queen's Christmas broadcasts , 2000, Journal of the International Phonetic Association.

[27]  M. Brent,et al.  The role of exposure to isolated words in early vocabulary development , 2001, Cognition.

[28]  Ted Briscoe,et al.  Lexical Access in Connected Speech Recognition , 1989, ACL.

[29]  Laurence White,et al.  Integration of multiple speech segmentation cues: a hierarchical framework. , 2005, Journal of experimental psychology. General.

[30]  Richard N. Aslin,et al.  Segmenting a continuous acoustic speech stream: Serial learning in cotton-top tamarin monkeys , 2001 .

[31]  C. Anton Rytting Finding the gaps: applying a connectionist model of word segmentation to noisy phone-recognized speech data , 2006, INTERSPEECH.

[32]  M. Tomasello,et al.  Variability in early communicative development. , 1994, Monographs of the Society for Research in Child Development.

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  P. Jusczyk,et al.  Some Beginnings of Word Comprehension in 6-Month-Olds , 1999 .

[35]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[36]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[37]  J R Saffran,et al.  Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. , 1995, Child development.

[38]  Stephen A. Hockema,et al.  Finding Words in Speech: An Investigation of American English , 2006 .

[39]  J Bertoncini,et al.  Newborns discriminate the rhythm of multisyllabic stressed words. , 1997, Developmental psychology.

[40]  P. Jusczyk,et al.  Infants′ Sensitivity to the Sound Patterns of Native Language Words , 1993 .

[41]  Nick Campbell,et al.  Accent, stress, and spectral tilt , 1997 .

[42]  Li Deng,et al.  An overlapping-feature-based phonological model incorporating linguistic constraints: applications to speech recognition. , 2002, The Journal of the Acoustical Society of America.

[43]  Erik D. Thiessen,et al.  Spectral tilt as a cue to word segmentation in infancy and adulthood , 2004, Perception & psychophysics.

[44]  Morten H. Christiansen,et al.  Integrating Distributional, Prosodic and Phonological Information in a Connectionist Model of Language Acquisition , 2001 .

[45]  Peter Trudgill,et al.  The Last Yankee in the Pacific: Eastern New England Phonology in the Bonin Islands , 2004 .

[46]  Mari Ostendorf,et al.  Speech recognition system design based on automatically derived units , 1999 .

[47]  J. Morgan A Rhythmic Bias in Preverbal Speech Segmentation , 1996 .

[48]  Victor Zue,et al.  Properties of large lexicons: Implications for advanced isolated word recognition systems , 1982, ICASSP.

[49]  P. Jusczyk,et al.  Phonotactic cues for segmentation of fluent speech by infants , 2001, Cognition.

[50]  A. Friederici,et al.  Phonotactic knowledge of word boundaries and its use in infant speech perception , 1993, Perception & psychophysics.

[51]  A. Samuel,et al.  Implications of stress-pattern differences in spoken-word recognition , 2000 .

[52]  Morten H. Christiansen,et al.  A Connectionist Single-Mechanism Account of Rule-Like Behavior in Infancy , 2000 .

[53]  M. Goldsmith,et al.  Statistical Learning by 8-Month-Old Infants , 1996 .

[54]  Gary Lupyan,et al.  Case, Word Order, and Language Learnability: Insights from Connectionist Modeling , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[55]  Pauline Susan Welby,et al.  The Slaying of Lady Mondegreen, being a Study of French Tonal Association and Alignment and their Role in Speech Segmentation , 2003 .

[56]  C. Baltaxe,et al.  Principles of phonology , 1969 .

[57]  Nick Groom,et al.  Reliques of ancient English poetry , 1996 .

[58]  Morten H. Christiansen,et al.  Integrating Multiple Cues in Word Segmentation: A Connectionist Model using Hints , 1996 .

[59]  J. Mehler,et al.  Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. , 1994, The Journal of the Acoustical Society of America.

[60]  Ying Lin,et al.  Learning features and segments from waveforms : a statistical model of early phonological acquisition , 2005 .

[61]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[62]  Erik D. Thiessen,et al.  When cues collide: use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. , 2003, Developmental psychology.

[63]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[64]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[65]  K. Demuth,et al.  The prosodic structure of early words , 1996 .

[66]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[67]  Amanda Seidl,et al.  Infant word segmentation revisited: edge alignment facilitates target extraction. , 2006, Developmental science.

[68]  Tomás Dubeda,et al.  Acoustic analysis of Czech stress: intonation, duration and intensity revisited , 2005, INTERSPEECH.

[69]  Carl de Marcken,et al.  Unsupervised language acquisition , 1996, ArXiv.

[70]  Peter W. Jusczyk,et al.  Finding and Remembering Words , 1997 .

[71]  Morten H. Christiansen,et al.  Coping with Variation in Speech Segmentation , 1997 .

[72]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[73]  Kenneth Ward Church Phonological parsing in speech recognition , 1987 .

[74]  Kadri Hacioglu,et al.  Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[75]  J. Werker,et al.  Adult and infant perception of two English phones. , 1997, The Journal of the Acoustical Society of America.

[76]  M. Fourakis,et al.  An Acoustic Study of the Effects of Tempo and Stress on Segmental Intervals in Modern Greek , 1986, Phonetica.

[77]  Michael Gasser,et al.  Linguistic Relativity and Word Acquisition: A Computational Approach , 1998 .

[78]  R. Quirk,et al.  A Corpus of English Conversation , 1980 .

[79]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[80]  Morten H. Christiansen,et al.  Learning to Segment Speech Using Multiple Cues: A Connectionist Model , 1998 .

[81]  D. Pisoni,et al.  Infants' Recognition of the Sound Patterns of Their Own Names , 1995, Psychological science.

[82]  P. Jusczyk,et al.  Perception of a phonetic contrast in multisyllabic utterances by 2-month-old infants , 1978, Perception & psychophysics.

[83]  C. Anton Rytting Is Recurrence Redundant? Revisiting Allen and Christiansen (1996) , 2006 .

[84]  J. Morgan,et al.  Mommy and Me , 2005, Psychological science.

[85]  P. Jusczyk,et al.  Young infants' retention of information about bisyllabic utterances. , 1995, Journal of experimental psychology. Human perception and performance.

[87]  Steven Greenberg,et al.  UNDERSTANDING SPEECH UNDERSTANDING: TOWARDS A UNIFIED THEORY OF SPEECH PERCEPTION , 1996 .

[88]  P. Jusczyk,et al.  Infants' early ability to segment the conversational speech signal predicts later language development: a retrospective analysis. , 2006, Developmental psychology.

[89]  Louis Boves,et al.  Syllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition , 2005 .

[90]  Brian Scassellati,et al.  Audio Speech Segmentation Without Language-Specific Knowledge , 2006 .

[91]  Erik D. Thiessen,et al.  Pattern induction by infant language learners. , 2003, Developmental psychology.

[92]  James Hammerton Learning to Segment Speech with Self-Organising Maps , 2002, CLIN.

[93]  M. Tanenhaus,et al.  Gradient effects of within-category phonetic variation on lexical access , 2002, Cognition.

[94]  R Carlson,et al.  Quarterly Progress and Status Report Phonetic and orthographic properties of the basic vocabulary of five European languages , 2007 .

[95]  Anne Cutler,et al.  Function words in early speech perception , 2003 .

[96]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[97]  J. Mehler,et al.  Language discrimination by newborns: toward an understanding of the role of rhythm. , 1998, Journal of experimental psychology. Human perception and performance.

[98]  N. Chater,et al.  Bootstrapping Word Boundaries: A Bottom-up Corpus-Based Approach to Speech Segmentation , 1997, Cognitive Psychology.

[99]  Xiaopeng Tao,et al.  Chinese Text Segmentation With MBDP-1: Making the Most of Training Corpora , 2001, ACL.

[100]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[101]  E. Carterette,et al.  Informal speech : alphabetic & phonemic texts with statistical analyses and tables , 1974 .

[102]  Peter M. Vishton,et al.  Rule learning by seven-month-old infants. , 1999, Science.

[103]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[104]  P. Jusczyk,et al.  A precursor of language acquisition in young infants , 1988, Cognition.

[105]  R. Aslin,et al.  Infants are sensitive to within-category variation in speech perception , 2005, Cognition.

[106]  Eric Fosler-Lussier,et al.  Phonetic ignorance is bliss: investigating the effects of phonetic information reduction on ASR performance , 2005, INTERSPEECH.

[107]  Eleanor Olds Batchelder,et al.  Computational evidence for the use of frequency information in discovery of the infant's first lexicon , 1997 .

[108]  Philip S. Dale,et al.  Macarthur Communicative Development Inventories , 1992 .

[109]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[110]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.