Consistency Analysis of an Empirical Minimum Error Entropy Algorithm

In this paper we study the consistency of an empirical minimum error entropy (MEE) algorithm in a regression setting. We introduce two types of consistency. The error entropy consistency, which requires the error entropy of the learned function to approximate the minimum error entropy, is shown to be always true if the bandwidth parameter tends to 0 at an appropriate rate. The regression consistency, which requires the learned function to approximate the regression function, however, is a complicated issue. We prove that the error entropy consistency implies the regression consistency for homoskedastic models where the noise is independent of the input variable. But for heteroskedastic models, a counterexample is used to show that the two types of consistency do not coincide. A surprising result is that the regression consistency is always true, provided that the bandwidth parameter tends to infinity at an appropriate rate. Regression consistency of two classes of special models is shown to hold with fixed bandwidth parameter, which further illustrates the complexity of regression consistency of MEE. Fourier transform plays crucial roles in our analysis.

[1]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[2]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[3]  Jun Fan,et al.  Learning theory approach to minimum error entropy criterion , 2012, J. Mach. Learn. Res..

[4]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[5]  Charles A. Micchelli,et al.  Using the Refinement Equations for the Construction of Pre-Wavelets II: Powers of Two , 1991, Curves and Surfaces.

[6]  R. G. Laha,et al.  On a class of unimodal distributions , 1961 .

[7]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Deniz Erdogmus,et al.  Blind source separation using Renyi's -marginal entropies , 2002, Neurocomputing.

[9]  Deniz Erdoğmuş,et al.  COMPARISON OF ENTROPY AND MEAN SQUARE ERROR CRITERIA IN ADAPTIVE SYSTEM TRAINING USING HIGHER ORDER STATISTICS , 2004 .

[10]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[11]  Qiang Wu,et al.  Classification and regularization in learning theory , 2005 .

[12]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[13]  Deniz Erdogmus,et al.  Convergence properties and data efficiency of the minimum error entropy criterion in ADALINE training , 2003, IEEE Trans. Signal Process..

[14]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[17]  Luís A. Alexandre,et al.  The MEE Principle in Data Classification: A Perceptron-Based Analysis , 2010, Neural Computation.

[18]  Luís A. Alexandre,et al.  Neural network classification using Shannon's entropy , 2005, ESANN.

[19]  Gennady Samorodnitsky,et al.  Tails of Le´vy Measure of Geometric Stable Random Variables , 1999 .