论文信息 - Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative log-likelihood of the example with respect to the current parameter of the algorithm. An off-line algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the on-line algorithm over the total loss of the best off-line parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.

Manfred K. Warmuth | Katy S. Azoury

[1] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[2] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[4] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[5] J. F. C. Kingman,et al. Information and Exponential Families in Statistical Theory , 1980 .

[6] Y. Censor,et al. An iterative row-action method for interval convex programming , 1981 .

[7] C. Morris. Natural Exponential Families with Quadratic Variance Functions , 1982 .

[8] P. McCullagh,et al. Generalized Linear Models , 1984 .

[9] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[10] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[11] Alfredo De Santis,et al. Learning probabilistic prediction functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[12] P. McCullagh,et al. Generalized Linear Models , 1992 .

[13] Charles L. Byrne,et al. General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis , 1990, IEEE Trans. Inf. Theory.

[14] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[15] Dean Phillips Foster. Prediction in the Worst Case , 1991 .

[16] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[17] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.

[18] David Haussler,et al. How to use expert advice , 1993, STOC.

[19] Philip M. Long,et al. WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[20] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[21] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.

[22] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[23] Adrian F. M. Smith,et al. Conjugate Parameterizations for Natural Exponential Families , 1995 .

[24] Erik Ordentlich,et al. Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[25] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[26] T. Cover. Universal Portfolios , 1996 .

[27] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[28] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[29] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.

[30] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.

[31] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[32] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[33] Andrew R. Barron,et al. Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[34] David Haussler,et al. Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[35] Kenji Yamanishi,et al. A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[36] Manfred K. Warmuth,et al. Learning algorithms for tracking changing concepts and an investigation into the error surfaces of single artificial neurons , 1998 .

[37] Peter Gr Unwald. The minimum description length principle and reasoning under uncertainty , 1998 .

[38] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[39] A. Barron,et al. Asymptotically minimax regret by Bayes mixtures , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[40] Mark Herbster,et al. Tracking the best regressor , 1998, COLT' 98.

[41] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[42] Jürgen Forster,et al. On Relative Loss Bounds in Generalized Linear Regression , 1999, FCT.

[43] Manfred K. Warmuth,et al. Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[44] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[45] J. Neyman,et al. INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[46] Philip M. Long,et al. On-line learning of linear functions , 2005, computational complexity.

[47] A. Barron,et al. Asymptotically minimax regret for exponential families , 2005 .