论文信息 - The Maximum Entropy Relaxation Path

The Maximum Entropy Relaxation Path

The relaxed maximum entropy problem is concerned with finding a probability distribution on a finite set that minimizes the relative entropy to a given prior distribution, while satisfying relaxed max-norm constraints with respect to a third observed multinomial distribution. We study the entire relaxation path for this problem in detail. We show existence and a geometric description of the relaxation path. Specifically, we show that the maximum entropy relaxation path admits a planar geometric description as an increasing, piecewise linear function in the inverse relaxation parameter. We derive fast algorithms for tracking the path. In various realistic settings, our algorithms require $O(n\log(n))$ operations for probability distributions on $n$ points, making it possible to handle large problems. Once the path has been recovered, we show that given a validation set, the family of admissible models is reduced from an infinite family to a small, discrete set. We demonstrate the merits of our approach in experiments with synthetic data and discuss its potential for the estimation of compact n-gram language models.

[1] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[2] Massimiliano Pontil,et al. Properties of Support Vector Machines , 1998, Neural Computation.

[3] R. Tibshirani,et al. The solution path of the generalized lasso , 2010, 1005.1971.

[4] P. Bühlmann,et al. Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[5] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6] Emmanuel J. Candès,et al. Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[7] Saharon Rosset,et al. Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[8] M. R. Osborne,et al. A new approach to variable selection in least squares problems , 2000 .

[9] Robert Tibshirani,et al. The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[10] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[11] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[12] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[13] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[14] P. Zhao. Boosted Lasso , 2004 .

[15] Trevor Darrell,et al. An efficient projection for l 1 , infinity regularization. , 2009, ICML 2009.

[16] Miroslav Dudík,et al. Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[17] Alistair Moffat,et al. Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[18] Dan Klein,et al. Faster and Smaller N-Gram Language Models , 2011, ACL.

[19] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[20] M. R. Osborne,et al. On the LASSO and its Dual , 2000 .

[21] S. Rosset,et al. Piecewise linear regularized solution paths , 2007, 0708.2197.

[22] Reinhard Kneser,et al. Statistical language modeling using a variable context length , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[23] Mee Young Park,et al. L1‐regularization path algorithm for generalized linear models , 2007 .

[24] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[25] Dana Ron,et al. The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[26] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .