论文信息 - A Comparison of Algorithms for Maximum Entropy Parameter Estimation

A Comparison of Algorithms for Maximum Entropy Parameter Estimation

Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Sur-prisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limited-memory variable metric algorithm outperformed the other choices.

Rob Malouf | Robert Malouf

[1] W. Deming,et al. On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[2] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[3] I. Good. Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[4] L. Campbell. Equivalence of Gauss's Principle and Minimum Discrimination Information Estimation of Probabilities , 1970 .

[5] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[6] M. J. D. Powell,et al. on The state of the art in numerical analysis , 1987 .

[7] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[8] John D. Lafferty,et al. Cluster Expansions and Iterative Scaling for Maximum Entropy Language Models , 1995, ArXiv.

[9] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[10] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[11] Jorge Nocedal,et al. Large Scale Unconstrained Optimization , 1997 .