Unsupervised learning of vowel categories from infant-directed speech

Infants rapidly learn the sound categories of their native language, even though they do not receive explicit or focused training. Recent research suggests that this learning is due to infants' sensitivity to the distribution of speech sounds and that infant-directed speech contains the distributional information needed to form native-language vowel categories. An algorithm, based on Expectation–Maximization, is presented here for learning the categories from a sequence of vowel tokens without (i) receiving any category information with each vowel token, (ii) knowing in advance the number of categories to learn, or (iii) having access to the entire data ensemble. When exposed to vowel tokens drawn from either English or Japanese infant-directed speech, the algorithm successfully discovered the language-specific vowel categories (/i, i, ε, e/ for English, /i, iː, e, eː/ for Japanese). A nonparametric version of the algorithm, closely related to neural network models based on topographic representation and competitive Hebbian learning, also was able to discover the vowel categories, albeit somewhat less reliably. These results reinforce the proposal that native-language speech categories are acquired through distributional learning and that such learning may be instantiated in a biologically plausible manner.

[1]  N. Sebastián-Gallés,et al.  Simultaneous Bilingualism and the Perception of a Language-Specific Vowel Contrast in the First Year of Life , 2003, Language and speech.

[2]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[3]  J. Gold,et al.  Neural computations that underlie decisions about sensory stimuli , 2001, Trends in Cognitive Sciences.

[4]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[5]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[6]  T. Poggio A theory of how the brain might work. , 1990, Cold Spring Harbor symposia on quantitative biology.

[7]  James L. McClelland,et al.  Success and failure of new speech category learning in adulthood: Consequences of learned Hebbian attractors in topographic maps , 2007, Cognitive, affective & behavioral neuroscience.

[8]  W T Maddox,et al.  On the relation between decision rules and perceptual representation in multidimensional perceptual categorization , 2000, Perception & psychophysics.

[9]  J. Werker,et al.  Cross-language speech perception: Evidence for perceptual reorganization during the first year of life , 1984 .

[10]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience , 2009 .

[11]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  A J van Hessen,et al.  Response distributions in intensity resolution and speech discrimination. , 1998, The Journal of the Acoustical Society of America.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  C. Best,et al.  Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. , 1988, Journal of experimental psychology. Human perception and performance.

[15]  Jessica Maye,et al.  Infant sensitivity to distributional information can affect phonetic discrimination , 2002, Cognition.

[16]  Teuvo Kohonen,et al.  Physiological interpretationm of the self-organizing map algorithm , 1993 .

[17]  S. Zahorian,et al.  Spectral-shape features versus formants as acoustic correlates for vowels. , 1993, The Journal of the Acoustical Society of America.

[18]  J. A. Anderson,et al.  A neural network model of multistable perception. , 1985, Acta psychologica.

[19]  Stephen Grossberg,et al.  Resonant neural dynamics of speech perception , 2003, J. Phonetics.

[20]  Yi Xu,et al.  Learning phonetic categories by tracking movements , 2007, Cognition.

[21]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[22]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[23]  Y. Rosseel Mixture models of categorization , 2002 .

[24]  J. Werker,et al.  Developmental aspects of cross-language speech perception. , 1981, Child development.

[25]  A. Fernald,et al.  A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants , 1989, Journal of Child Language.

[26]  T M Nearey,et al.  On the sufficiency of compound target specification of isolated vowels and vowels in /bVb/ syllables. , 1992, The Journal of the Acoustical Society of America.

[27]  J. Mullennix,et al.  Talker Variability in Speech Processing , 1997 .

[28]  H Crabus,et al.  Untersuchungen zur Lokalisierung von Wahrnehmungsprozessen: Figurale Nachwirkungen bei binokularen Wettstreit-Bedingungen , 1973 .

[29]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[30]  S Fusi,et al.  Forming classes by stimulus frequency: Behavior and theory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Gautam K. Vallabha,et al.  Perceptuomotor bias in the imitation of steady-state vowels. , 2004, The Journal of the Acoustical Society of America.

[32]  Satrajit S. Ghosh,et al.  Representation of sound categories in auditory cortical maps. , 2004, Journal of speech, language, and hearing research : JSLHR.

[33]  D. Pisoni,et al.  Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. , 1993, The Journal of the Acoustical Society of America.

[34]  B. Dosher,et al.  The dynamics of perceptual learning: an incremental reweighting model. , 2005, Psychological review.

[35]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[37]  Laurel Fais,et al.  Infant-directed speech supports phonetic category learning in English and Japanese , 2007, Cognition.

[38]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[39]  G. Westermann,et al.  A new model of sensorimotor coupling in the development of speech , 2004, Brain and Language.

[40]  Bart de Boer,et al.  Investigating the role of infant-directed speech with a computer model , 2003 .

[41]  Calyampudi R. Rao Handbook of statistics , 1980 .

[42]  P. Kuhl,et al.  Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. , 2006, Developmental science.

[43]  N. Ratner Patterns of vowel modification in mother–child speech , 1984, Journal of Child Language.

[44]  Janet F. Werker,et al.  The development of phonetic representation in bilingual and monolingual infants , 2007, Applied Psycholinguistics.

[45]  F. Guenther,et al.  The perceptual magnet effect as an emergent property of neural map formation. , 1996, The Journal of the Acoustical Society of America.

[46]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[47]  J. Werker,et al.  Developmental changes in perception of nonnative vowel contrasts. , 1994, Journal of experimental psychology. Human perception and performance.

[48]  P. Kuhl,et al.  Cross-language analysis of phonetic units in language addressed to infants. , 1997, Science.