Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation---for instance "the" might be realized as [ði] or [ðə]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

[1]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[2]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Armen E. Allahverdyan,et al.  Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs , 2011, NIPS.

[4]  Markus Dreyer,et al.  Latent-Variable Modeling of String Transductions with Finite-State Methods , 2008, EMNLP.

[5]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[6]  Valentin I. Spitkovsky,et al.  Viterbi Training Improves Unsupervised Dependency Parsing , 2010, CoNLL.

[7]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[8]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[9]  Fergus R. McInnes,et al.  Unsupervised Extraction of Recurring Words from Infant-Directed Speech , 2011, CogSci.

[10]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[11]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[12]  Sharon Peperkamp,et al.  Testing the Robustness of Online Word Segmentation: Effects of Linguistic Diversity and Phonetic Variation , 2011, CMCL@ACL.

[13]  Fernando Pereira,et al.  Weighted Rational Transductions and their Application to Human Language Processing , 1994, HLT.

[14]  Anand Venkataraman,et al.  A Statistical Model for Word Discovery in Transcribed Speech , 2001, CL.

[15]  L. R. Bahl Language-model/acoustic channel balance mechanism , 1980 .

[16]  Guillaume Aimetti,et al.  Modelling Early Language Acquisition Skills: Towards a General Statistical Learning Mechanism , 2009, EACL.

[17]  Naomi H. Feldman Interactions between word and speech sound categorization in language acquisition , 2011 .

[18]  Kenneth Ward Church,et al.  Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.

[19]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[20]  Shigeki Sagayama,et al.  Templatic features for modeling phoneme acquisition , 2011, CogSci.

[21]  Robert Daland,et al.  Learning Diphone-Based Segmentation , 2011, Cogn. Sci..

[22]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[23]  James L. McClelland,et al.  Unsupervised learning of vowel categories from infant-directed speech , 2007, Proceedings of the National Academy of Sciences.

[24]  Mark Johnson,et al.  Learning OT constraint rankings using a maximum entropy model , 2003 .

[25]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[26]  Margaret M. Fleck Lexicalized Phonotactic Word Segmentation , 2008, ACL.

[27]  Thomas L. Griffiths,et al.  Learning phonetic categories by learning a lexicon , 2009 .

[28]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[29]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[30]  Sanjeev Khudanpur,et al.  Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.

[31]  C. Anton Rytting,et al.  Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen) , 2007 .

[32]  O. Räsänen A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events , 2011, Cognition.

[33]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[34]  Louis ten Bosch,et al.  Adaptive non-negative matrix factorization in a computational model of language acquisition , 2009, INTERSPEECH.