On the optimality of conditional expectation as a Bregman predictor

We consider the problem of predicting a random variable X from observations, denoted by a random variable Z. It is well known that the conditional expectation E[X|Z] is the optimal L/sup 2/ predictor (also known as "the least-mean-square error" predictor) of X, among all (Borel measurable) functions of Z. In this orrespondence, we provide necessary and sufficient conditions for the general loss functions under which the conditional expectation is the unique optimal predictor. We show that E[X|Z] is the optimal predictor for all Bregman loss functions (BLFs), of which the L/sup 2/ loss function is a special case. Moreover, under mild conditions, we show that the BLFs are exhaustive, i.e., if for every random variable X, the infimum of E[F(X,y)] over all constants y is attained by the expectation E[X], then F is a BLF.

[1]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[3]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[4]  I. Csiszár Generalized projections for non-negative functions , 1995 .

[5]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[6]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[7]  C. H. Edwards Advanced calculus of several variables , 1973 .

[8]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Robert M. Gray,et al.  Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure , 1997, IEEE Trans. Inf. Theory.

[11]  I. Ekeland,et al.  Convex analysis and variational problems , 1976 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[14]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[15]  P. Bassanini,et al.  Elliptic Partial Differential Equations of Second Order , 1997 .

[16]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.