论文信息 - Unsupervised Lexicon Discovery from Acoustic Input

Unsupervised Lexicon Discovery from Acoustic Input

We present a model of unsupervised phonological lexicon discovery—the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model’s behavior and the kinds of linguistic structures it learns.

James R. Glass | Timothy J. O'Donnell | Chia-ying Lee | Timothy J. O'Donnell | Chia-ying Lee

[1] Micha Elsner,et al. A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability , 2013, EMNLP.

[2] M. Brent. Speech segmentation and word discovery: a computational perspective , 1999, Trends in Cognitive Sciences.

[3] Carl de Marcken,et al. Unsupervised language acquisition , 1996, ArXiv.

[4] Timothy O'Donnell,et al. Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage , 2015 .

[5] Lin-Shan Lee,et al. Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Mark Johnson,et al. Nonparametric bayesian models of lexical acquisition , 2007 .

[7] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[8] Jim Pitman,et al. The two-parameter generalization of Ewens' random partition structure , 2003 .

[9] Andrew Y. Ng,et al. Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[10] Carl de Marcken. Linguistic Structure as Composition and Perturbation , 1996, ACL.

[11] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[12] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Odette Scharenborg,et al. Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. , 2010, The Journal of the Acoustical Society of America.

[14] Bhiksha Raj,et al. Unsupervised word segmentation from noisy input , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15] T. Poggio,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[16] Mari Ostendorf,et al. Joint lexicon, acoustic unit inventory and model design , 1999, Speech Commun..

[17] James R. Glass,et al. One-shot learning of generative speech concepts , 2014, CogSci.

[18] Michael R. Brent,et al. An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[19] Fergus R. McInnes,et al. Unsupervised Extraction of Recurring Words from Infant-Directed Speech , 2011, CogSci.

[20] Shlomo Argamon,et al. Efficient Unsupervised Recursive Word Segmentation Using Minimum Description Length , 2004, COLING.

[21] Steve Young,et al. Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[22] Joshua Goodman,et al. Parsing Inside-Out , 1998, ArXiv.

[23] Hung-An Chang,et al. Resource configurable spoken query detection using Deep Boltzmann Machines , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24] Dan Klein,et al. The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[25] Thomas L. Griffiths,et al. Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[26] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.

[27] R N Aslin,et al. Statistical Learning by 8-Month-Old Infants , 1996, Science.

[28] Z. Harris. From Phoneme to Morpheme , 1955 .

[29] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30] Mark Johnson,et al. Unsupervised phonemic Chinese word segmentation using Adaptor Grammars , 2010, COLING.

[31] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[32] Chia-ying Lee,et al. Discovering linguistic structures in speech: models and applications , 2014 .

[33] S. Chib,et al. Understanding the Metropolis-Hastings Algorithm , 1995 .

[34] Thomas L. Griffiths,et al. Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[35] Tatsuya Kawahara,et al. Bayesian Learning of a Language Model from Continuous Speech , 2012, IEICE Trans. Inf. Syst..

[36] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.

[37] Vladimir Solmon,et al. The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[38] James R. Glass,et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[39] Michael C. Frank,et al. Modeling human performance in statistical word segmentation , 2010, Cognition.

[40] Guillaume Aimetti,et al. Modelling Early Language Acquisition Skills: Towards a General Statistical Learning Mechanism , 2009, EACL.

[41] Mark Johnson,et al. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.

[42] John Goldsmith,et al. An algorithm for the unsupervised learning of morphology , 2006, Natural Language Engineering.

[43] Mark Johnson,et al. Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars , 2008, SIGMORPHON.

[44] Yu Zhang,et al. Joint Learning of Phonetic Units and Word Pronunciations for ASR , 2013, EMNLP.

[45] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[46] Mark Johnson,et al. Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[47] James R. Glass,et al. Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[48] J. Pitman,et al. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[49] Yaodong Zhang,et al. Unsupervised speech processing with applications to query-by-example spoken term detection , 2013 .

[50] Mark Johnson,et al. Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[51] Michael C. Frank,et al. Learning and Long-Term Retention of Large-Scale Artificial Languages , 2013, PloS one.

[52] E. Newport,et al. WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[53] T. Griffiths,et al. A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[54] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[55] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[56] James R. Glass. A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..