论文信息 - A Fast Variational Approach for Learning Markov Random Field Language Models - 字舞流文

A Fast Variational Approach for Learning Markov Random Field Language Models

Language modelling is a fundamental building block of natural language processing. However, in practice the size of the vocabulary limits the distributions applicable for this task: specifically, one has to either resort to local optimization methods, such as those used in neural language models, or work with heavily constrained distributions. In this work, we take a step towards overcoming these difficulties. We present a method for global-likelihood optimization of a Markov random field language model exploiting long-range contexts in time independent of the corpus size. We take a variational approach to optimizing the likelihood and exploit underlying symmetries to greatly simplify learning. We demonstrate the efficiency of this method both for language modelling and for part-of-speech tagging.

Alexander M. Rush | David Sontag | Yacine Jernite | D. Sontag | Yacine Jernite

[1] Jeremy Jancsary,et al. Convergent Decomposition Solvers for Tree-reweighted Free Energies , 2011, AISTATS.

[2] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[3] Hung Hai Bui,et al. Lifted Tree-Reweighted Variational Inference , 2014, UAI.

[4] Michael I. Jordan,et al. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[5] Tommi S. Jaakkola,et al. Convergent Propagation Algorithms via Oriented Trees , 2007, UAI.

[6] Hung Hai Bui,et al. Automorphism Groups of Graphical Models and Lifted Variational Inference , 2012, UAI.

[7] Martin J. Wainwright,et al. A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[8] Yair Weiss,et al. Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[9] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .

[10] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[11] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13] Tamir Hazan,et al. A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[14] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[15] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[16] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[17] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[19] Tommi S. Jaakkola,et al. Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[20] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .