论文信息 - Exponential Priors for Maximum Entropy Models

Exponential Priors for Maximum Entropy Models

Maximum entropy models are a common modeling technique, but prone to overfitting. We show that using an exponential distribution as a prior leads to bounded absolute discounting by a constant. We show that this prior is better motivated by the data than previous techniques such as a Gaussian prior, and often produces lower error rates. Exponential priors also lead to a simpler learning algorithm and to easier to understand behavior. Furthermore, exponential priors help explain the success of some previous smoothing techniques, and suggest simple variations that work better.

Joshua Goodman | Joshua Goodman

[1] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[2] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[3] William I. Newman,et al. Extension to the maximum entropy method , 1977, IEEE Trans. Inf. Theory.

[4] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[5] Raymond Lau,et al. Adaptive statistical language modeling , 1994 .

[6] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[7] Peter M. Williams,et al. Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[8] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[10] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[11] Adwait Ratnaparkhi,et al. A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[12] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Mitchell P. Marcus,et al. Maximum entropy models for natural language ambiguity resolution , 1998 .

[14] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[15] Ronald Rosenfeld,et al. A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[16] Michele Banko,et al. Mitigating the Paucity of Data Problem , 2001 .

[17] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18] Joshua Goodman,et al. Sequential Conditional Generalized Iterative Scaling , 2002, ACL.

[19] David Heckerman,et al. CFW: A Collaborative Filtering System Using Posteriors over Weights of Evidence , 2002, UAI.

[20] James Theiler,et al. Online Feature Selection using Grafting , 2003, ICML.