Bootstrapping Word Boundaries: A Bottom-up Corpus-Based Approach to Speech Segmentation

Speech is continuous, and isolating meaningful chunks for lexical access is a nontrivial problem. In this paper we use neural network models and more conventional statistics to study the use of sequential phonological probabilities in the segmentation of an idealized phonological transcription of the London-Lund Corpus; these speech data are representative of genuine conversational English. We demonstrate, first, that the distribution of phonetic segments in English is an important cue to segmentation, and, second, that the distributional information is such that it might allow the infant, beginning with only a sensitivity to the statistics of subsegmental primitives, to bootstrap into a series of increasingly sophisticated segmentation competences, ending with an adult competence. We discuss the relation between the behavior of the models and existing psycholinguistic studies of speech segmentation. In particular, we confirm the utility of the Metrical Segmentation Strategy (Cutler & Norris, 1988) and demonstrate a route by which this utility might be recognized by the infant, without requiring the prior specification of categories like "syllable" or "strong syllable."

[1]  D. Norris,et al.  The Possible-Word Constraint in the Segmentation of Continuous Speech , 1997, Cognitive Psychology.

[2]  R. Shillcock,et al.  The role of phonotactic range in the order of acquisition of English consonants , 1997 .

[3]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[4]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[5]  Nick Chater,et al.  A statistical analysis of an idealised phonological transcription of the London-Lund corpus , 1995 .

[6]  Anne Cutler,et al.  Competition and segmentation in spoken word recognition , 1994, ICSLP.

[7]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[8]  D. Norris,et al.  Competition in spoken word recognition: Spotting words in other words , 1994 .

[9]  Badrinath Roysam,et al.  Joint solution of low, intermediate, and high-level vision tasks by evolutionary optimization: Application to computer vision at low SNR , 1994, IEEE Trans. Neural Networks.

[10]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[11]  P. Jusczyk,et al.  Infants′ Sensitivity to the Sound Patterns of Native Language Words , 1993 .

[12]  P. Jusczyk,et al.  Infants' preference for the predominant stress patterns of English words. , 1993, Child development.

[13]  LouAnn Gerken,et al.  Interplay of Function Morphemes and Prosody in Early Language , 1993 .

[14]  A. Friederici,et al.  Phonotactic knowledge of word boundaries and its use in infant speech perception , 1993, Perception & psychophysics.

[15]  J. Mehler,et al.  Mora or syllable? Speech segmentation in Japanese , 1993 .

[16]  A. Cutler Phonological cues to open- and closed-class words in the processing of spoken sentences , 1993, Journal of Psycholinguistic Research.

[17]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[18]  J. Werker Developmental changes in cross-language speech perception: Implications for cognitive models of speech processing. , 1993 .

[19]  Kari Suomi,et al.  An outline of a developmental model of adult phonological organization and behavior , 1993 .

[20]  N Chater,et al.  PROCESSING TIME-WARPED SEQUENCES USING RECURRENT NEURAL NETWORKS - MODELING RATE-DEPENDENT FACTORS IN SPEECH-PERCEPTION , 1993 .

[21]  Richard Shillcock,et al.  Cognitive models of speech processing : the Second Sperlonga Meeting , 1993 .

[22]  J. Mehler,et al.  The periodicity bias , 1993 .

[23]  Anne Cutler,et al.  The monolingual nature of speech segmentation by bilinguals , 1992, Cognitive Psychology.

[24]  A. Cutler,et al.  Rhythmic cues to speech segmentation: Evidence from juncture misperception , 1992 .

[25]  Nick Chater,et al.  FINDING LINGUISTIC STRUCTURE WITH RECURRENT NEURAL NETWORKS , 1992 .

[26]  Geoff Williams,et al.  Automatic speech recognition: a principle-based approach , 1992 .

[27]  Nick Chater,et al.  A phonologically motivated input representation for the modelling of auditory word perception in continuous speech , 1992 .

[28]  Andrew S. Noetzel,et al.  Forced Simple Recurrent Neural Networks and Grammatical Inference , 1992 .

[29]  N. Cowan Recurrent speech patterns as cues to the segmentation of multisyllabic sequences. , 1991, Acta psychologica.

[30]  Emmanuel Dupoux,et al.  Constraining models of lexical access: the onset of word recognition , 1991 .

[31]  Dennis Norris,et al.  A dynamic-net model of human speech recognition , 1991 .

[32]  Ulrich H. Frauenfelder,et al.  Lexical segmentation in TRACE: an exercise in simulation , 1991 .

[33]  David Zipser,et al.  UNSUPERVISED DISCOVERY OF SPEECH SEGMENTS USING RECURRENT NETWORKS , 1991 .

[34]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[35]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[36]  G. Altmann,et al.  Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives - Workshop Overview , 1989, AI Mag..

[37]  Pierre Perruchet,et al.  Synthetic Grammar Learning : Implicit Rule Abstraction or Explicit Fragmentary Knowledge ? Pierre Perruchet and Chantal Pacteau , 1990 .

[38]  Anne Cutler,et al.  Auditory lexical access: where do we start? , 1989 .

[39]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[40]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[41]  D. Lightfoot The child's trigger experience: Degree-0 learnability , 1989, Behavioral and Brain Sciences.

[42]  R. Shillcock,et al.  The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context , 1988, Perception & psychophysics.

[43]  Jonathan Harrington,et al.  Word Boundary Identification from Phoneme Sequence Constraints in Automatic Continuous Speech Recognition , 1988, COLING.

[44]  C. Best,et al.  Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. , 1988, Journal of experimental psychology. Human perception and performance.

[45]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[46]  James L. McClelland,et al.  Cognitive penetration of the mechanisms of perception: Compensation for coarticulation of lexically restored phonemes , 1988 .

[47]  Anne Cutler,et al.  The role of strong syllables in segmentation for lexical access , 1988 .

[48]  Richard Shillcock,et al.  Some prosodic effects on human word recognition in continuous speech. , 1988 .

[49]  Anne Cutler,et al.  The predominance of strong initial syllables in the English vocabulary , 1987 .

[50]  M. Tanenhaus,et al.  Context effects in lexical processing , 1987, Cognition.

[51]  J. Gee,et al.  Prosodic structure and spoken word recognition , 1987, Cognition.

[52]  Kenneth Ward Church,et al.  Phonological parsing and lexical retrieval , 1987, Cognition.

[53]  Lorraine Komisarjevsky Tyler,et al.  Spoken word recognition , 1987 .

[54]  Anne Cutler,et al.  The syllable's differing role in the segmentation of French and English. , 1986 .

[55]  Jeffrey L. Elman,et al.  Interactive processes in speech perception: the TRACE model , 1986 .

[56]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[57]  D. Rumelhart Learning Internal Representations by Error Propagation, Parallel Distributed Processing , 1986 .

[58]  Jean Lowenstamm,et al.  The internal structure of phonological elements: a theory of charm and government , 1985, Phonology Yearbook.

[59]  C Snow,et al.  Child language data exchange system , 1984, Journal of Child Language.

[60]  J. Werker,et al.  Cross-language speech perception: Evidence for perceptual reorganization during the first year of life , 1984 .

[61]  Morton Ann Gernsbacher,et al.  Cracking the Dual Code: Toward a Unitary Model of Phoneme Identification. , 1983, Journal of verbal learning and verbal behavior.

[62]  Ann M. Peters,et al.  The Units of Language Acquisition , 1983 .

[63]  Daniel P. Huttenlocher,et al.  Phonotactic and Lexical Constraints in Speech Recognition , 1983, AAAI.

[64]  P. Kuhl Perception of auditory equivalence classes for speech in early infancy , 1983 .

[65]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[66]  Ulrich Hans Frauenfelder,et al.  The syllable's role in speech segmentation , 1981 .

[67]  S. S. Marcus ERIS-context sensitive coding in speech perception , 1981 .

[68]  J. Mehler,et al.  Syllables as units in infant speech perception , 1981 .

[69]  R. Quirk,et al.  A Corpus of English Conversation , 1980 .

[70]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[71]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[72]  B. MacWhinney The Acquisition Of Morphophonology , 1978 .

[73]  I. Lehiste The Timing of Utterances and Linguistic Boundaries , 1972 .

[74]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .