论文信息 - Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling - 字舞流文

Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling

In this paper we consider the unsupervised word discovery from phonetic input. We employ a word segmentation algorithm which simultaneously develops a lexicon, i.e., the transcription of a word in terms of a phone sequence, learns a n-gram language model describing word and word sequence probabilities, and carries out the segmentation itself. The underlying statistical model is that of a Pitman-Yor process, a concept known from Bayesian non-parametrics, which allows for an a priori unknown and unlimited number of different words. Using a hierarchy of Pitman-Yor processes, language models of different order can be employed and nesting it with another hierarchy of Pitman-Yor processes on the phone level allows for backing off unknown word unigrams by phone m-grams. We present results on a large-vocabulary task, assuming an error-free phone sequence is given. We finish by discussing options how to cope with noisy phone sequences.

Bhiksha Raj | Reinhold Haeb-Umbach | Sourish Chaudhuri | Oliver Walter

[1] Joerg Schmalenstroeer,et al. A NOVEL INITIALIZATION METHOD FOR UNSUPERVISED LEARNING OF ACOUSTIC PATTERNS IN SPEECH DEPARTMENT OF COMMUNICATIONS ENGINEERING TECHNICAL REPOR T FGNT-2013-01 , 2013 .

[2] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[3] Mike E. Davies,et al. Latent Variable Analysis and Signal Separation , 2010 .

[4] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[5] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Tatsuya Kawahara,et al. Bayesian Learning of a Language Model from Continuous Speech , 2012, IEICE Trans. Inf. Syst..

[7] Reinhold Häb-Umbach,et al. Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering , 2011, INTERSPEECH.

[8] Yee Whye Teh,et al. A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[9] Maya R. Gupta,et al. Introduction to the Dirichlet Distribution and Related Processes , 2010 .

[10] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11] Bhiksha Raj,et al. Unsupervised Structure Discovery for Semantic Analysis of Audio , 2012, NIPS.

[12] J. Pitman,et al. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[13] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[14] Hugo Van hamme,et al. Unsupervised learning of time-frequency patches as a noise-robust representation of speech , 2009, Speech Commun..

[15] Hirokazu Kameoka,et al. Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms , 2010, LVA/ICA.

[16] Reinhold Haeb-Umbach,et al. A Novel Initialization Method for Unsupervised Learning of Acoustic Patterns in Speech (FGNT-2013-01) , 2013 .

[17] Barak A. Pearlmutter,et al. Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[18] Tadahiro Taniguchi,et al. Double articulation analyzer for unsegmented human motion using Pitman-Yor language model and infinite hidden Markov model , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).

[19] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[20] Nuria Oliver,et al. Spoken WordCloud: Clustering recurrent patterns in speech , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[21] Micha Elsner,et al. Bootstrapping a Unified Model of Lexical and Phonetic Acquisition , 2012, ACL.