On minimizing distortion and relative entropy

A common approach for estimating a probability mass function w when given a prior q and moment constraints given by Aw/spl les/b is to minimize the relative entropy between w and q subject to the set of linear constraints. In such cases, the solution w is known to have exponential form. We consider the case in which the linear constraints are noisy, uncertain, infeasible, or otherwise "soft." A solution can then be obtained by minimizing both the relative entropy and violation of the constraints Aw/spl les/b. A penalty parameter /spl sigma/ weights the relative importance of these two objectives. We show that this penalty formulation also yields a solution w with exponential form. If the distortion is based on an /spl lscr//sub p/ norm, then the exponential form of w is shown to have exponential decay parameters that are bounded as a function of /spl sigma/. We also state conditions under which the solution w to the penalty formulation will result in zero distortion, so that the moment constraints hold exactly. These properties are useful in choosing penalty parameters, evaluating the impact of chosen penalty parameters, and proving properties about methods that use such penalty formulations.

[1]  Jeffrey Owen Katz,et al.  The Encyclopedia of Trading Strategies , 2000 .

[2]  Maya R. Gupta,et al.  Simulating the effect of illumination using color transformations , 2005, IS&T/SPIE Electronic Imaging.

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[5]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[6]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[7]  H. Gzyl,et al.  Maxentropic interpolation by cubic splines with possibly noisy data , 2001 .

[8]  Guy Le Besnerais,et al.  The Maximum Entropy on the Mean Method, Noise and Sensitivity , 1996 .

[9]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[10]  Jon W. Tolle,et al.  Exact penalty functions in nonlinear programming , 1973, Math. Program..

[11]  Imre Csiszár,et al.  MEM pixel correlated solutions for generalized moment and interpolation problems , 1999, IEEE Trans. Inf. Theory.

[12]  Maya R. Gupta,et al.  Analysis and classification of internal pipeline images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[13]  P. A. Mello,et al.  Quantum Transport in Mesoscopic Systems: Complexity and Statistical Fluctuations. A Maximum Entropy Viewpoint , 2004 .

[14]  Maya R. Gupta,et al.  Reducing bias in supervised learning , 2003, IEEE Workshop on Statistical Signal Processing, 2003.

[15]  J. Cadzow Maximum Entropy Spectral Analysis , 2006 .

[16]  Guy Le Besnerais,et al.  A new look at entropy for solving linear inverse problems , 1999, IEEE Trans. Inf. Theory.

[17]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[18]  T. Pietrzykowski An Exact Potential Method for Constrained Maxima , 1969 .

[19]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[20]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[21]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  P. A. Mello,et al.  Quantum Transport in Mesoscopic Systems , 2004 .

[27]  Jan Vlcek,et al.  Interior point methods for large-scale nonlinear programming , 2005, Optim. Methods Softw..

[28]  L. Lorne Campbell Minimum cross-entropy estimation with inaccurate side information , 1999, IEEE Trans. Inf. Theory.

[29]  Justin Buchler The Philosophy of Peirce: Selected Writings , 1941 .

[30]  R. Fletcher Practical Methods of Optimization , 1988 .

[31]  Jorge Nocedal,et al.  An Interior Point Algorithm for Large-Scale Nonlinear Programming , 1999, SIAM J. Optim..

[32]  J. Navaza,et al.  The use of non‐local constraints in maximum‐entropy electron density reconstruction , 1986 .

[33]  W. T. Grandy,et al.  Physics and Probability , 2004 .

[34]  Robert J. Vanderbei,et al.  An Interior-Point Algorithm for Nonconvex Nonlinear Programming , 1999, Comput. Optim. Appl..

[35]  Sanjeev R. Kulkarni,et al.  Learning Pattern Classification - A Survey , 1998, IEEE Trans. Inf. Theory.

[36]  I. Csiszár Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .

[37]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[38]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[39]  Maya Rani Gupta,et al.  An information theory approach to supervised learning , 2003 .

[40]  Maya R. Gupta Inverting color transforms , 2004, IS&T/SPIE Electronic Imaging.