An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems

The paper investigates error-entropy-minimization in adaptive systems training. We prove the equivalence between minimization of error's Renyi (1970) entropy of order /spl alpha/ and minimization of a Csiszar (1981) distance measure between the densities of desired and system outputs. A nonparametric estimator for Renyi's entropy is presented, and it is shown that the global minimum of this estimator is the same as the actual entropy. The performance of the error-entropy-minimization criterion is compared with mean-square-error-minimization in the short-term prediction of a chaotic time series and in nonlinear system identification.

[1]  N. H. Anderson,et al.  Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates , 1994 .

[2]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[3]  V. Kvasnicka,et al.  Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[4]  Paul A. Viola,et al.  Learning Informative Statistics: A Nonparametnic Approach , 1999, NIPS.

[5]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[6]  C. Diks,et al.  Detecting differences between delay vector distributions. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[7]  S. Haykin,et al.  Making sense of a complex world [chaotic events modeling] , 1998, IEEE Signal Process. Mag..

[8]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[9]  Alfréd Rényi,et al.  Probability Theory , 1970 .

[10]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[11]  K. Loparo,et al.  Optimal state estimation for stochastic systems: an information theoretic approach , 1997, IEEE Trans. Autom. Control..

[12]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[13]  Christian Jutten,et al.  Source separation techniques applied to linear prediction , 2000 .

[14]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[15]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[16]  L. Glass,et al.  Understanding Nonlinear Dynamics , 1995 .

[17]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[19]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[20]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[21]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[22]  Deniz Erdoğmuş,et al.  COMPARISON OF ENTROPY AND MEAN SQUARE ERROR CRITERIA IN ADAPTIVE SYSTEM TRAINING USING HIGHER ORDER STATISTICS , 2004 .

[23]  S. Kullback,et al.  Information Theory and Statistics , 1959 .