Unsupervised Lexicon Discovery from Acoustic Input

We present a model of unsupervised phonological lexicon discovery—the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model’s behavior and the kinds of linguistic structures it learns.

[1]  Micha Elsner,et al.  A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability , 2013, EMNLP.

[2]  M. Brent Speech segmentation and word discovery: a computational perspective , 1999, Trends in Cognitive Sciences.

[3]  Carl de Marcken,et al.  Unsupervised language acquisition , 1996, ArXiv.

[4]  Timothy O'Donnell,et al.  Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage , 2015 .

[5]  Lin-Shan Lee,et al.  Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[7]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[8]  Jim Pitman,et al.  The two-parameter generalization of Ewens' random partition structure , 2003 .

[9]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[10]  Carl de Marcken Linguistic Structure as Composition and Perturbation , 1996, ACL.

[11]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[12]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Odette Scharenborg,et al.  Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. , 2010, The Journal of the Acoustical Society of America.

[14]  Bhiksha Raj,et al.  Unsupervised word segmentation from noisy input , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  T. Poggio,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[16]  Mari Ostendorf,et al.  Joint lexicon, acoustic unit inventory and model design , 1999, Speech Commun..

[17]  James R. Glass,et al.  One-shot learning of generative speech concepts , 2014, CogSci.

[18]  Michael R. Brent,et al.  An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[19]  Fergus R. McInnes,et al.  Unsupervised Extraction of Recurring Words from Infant-Directed Speech , 2011, CogSci.

[20]  Shlomo Argamon,et al.  Efficient Unsupervised Recursive Word Segmentation Using Minimum Description Length , 2004, COLING.

[21]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[22]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[23]  Hung-An Chang,et al.  Resource configurable spoken query detection using Deep Boltzmann Machines , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[25]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[26]  Kenneth Ward Church,et al.  Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.

[27]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[28]  Z. Harris From Phoneme to Morpheme , 1955 .

[29]  Kenneth Ward Church,et al.  A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Mark Johnson,et al.  Unsupervised phonemic Chinese word segmentation using Adaptor Grammars , 2010, COLING.

[31]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[32]  Chia-ying Lee,et al.  Discovering linguistic structures in speech: models and applications , 2014 .

[33]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[34]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[35]  Tatsuya Kawahara,et al.  Bayesian Learning of a Language Model from Continuous Speech , 2012, IEICE Trans. Inf. Syst..

[36]  James R. Glass,et al.  A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.

[37]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[38]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[39]  Michael C. Frank,et al.  Modeling human performance in statistical word segmentation , 2010, Cognition.

[40]  Guillaume Aimetti,et al.  Modelling Early Language Acquisition Skills: Towards a General Statistical Learning Mechanism , 2009, EACL.

[41]  Mark Johnson,et al.  Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.

[42]  John Goldsmith,et al.  An algorithm for the unsupervised learning of morphology , 2006, Natural Language Engineering.

[43]  Mark Johnson,et al.  Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars , 2008, SIGMORPHON.

[44]  Yu Zhang,et al.  Joint Learning of Phonetic Units and Word Pronunciations for ASR , 2013, EMNLP.

[45]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[46]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[47]  James R. Glass,et al.  Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[48]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[49]  Yaodong Zhang,et al.  Unsupervised speech processing with applications to query-by-example spoken term detection , 2013 .

[50]  Mark Johnson,et al.  Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[51]  Michael C. Frank,et al.  Learning and Long-Term Retention of Large-Scale Artificial Languages , 2013, PloS one.

[52]  E. Newport,et al.  WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[53]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[54]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[55]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[56]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..