Stochastic Gradient Algorithm Under (h,φ)-Entropy Criterion

Abstract Motivated by the work of Erdogmus and Principe, we use the error (h,φ)-entropy as the supervised adaptation criterion. Several properties of the (h,φ)-entropy criterion and the connections with traditional error criteria are investigated. By a kernel estimate approach, we obtain the nonparametric estimator of the instantaneous (h,φ)-entropy. Then, we develop the general stochastic information gradient algorithm, and derive the approximate upper bound for the step size in the adaptive linear neuron training. Moreover, the (h,φ) pair are optimized to improve the performance of the proposed algorithm. For the finite impulse response identification with white Gaussian input and noise, the exact optimum φ function is derived. Finally, simulation experiments verify the results and demonstrate the noticeable performance improvement that may be achieved by the optimum (h,φ)-entropy criterion.

[1]  Deniz Erdoğmuş,et al.  COMPARISON OF ENTROPY AND MEAN SQUARE ERROR CRITERIA IN ADAPTIVE SYSTEM TRAINING USING HIGHER ORDER STATISTICS , 2004 .

[2]  Teresa H. Y. Meng,et al.  Stochastic gradient adaptation under general error criteria , 1994, IEEE Trans. Signal Process..

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Bernard Widrow,et al.  The least mean fourth (LMF) adaptive algorithm and its family , 1984, IEEE Trans. Inf. Theory.

[5]  K. Loparo,et al.  Optimal state estimation for stochastic systems: an information theoretic approach , 1997, IEEE Trans. Autom. Control..

[6]  Seymour Sherman,et al.  Non-mean-square error criteria , 1958, IRE Trans. Inf. Theory.

[7]  James T. Lo,et al.  Existence and uniqueness of risk-sensitive estimates , 2002, IEEE Trans. Autom. Control..

[8]  Z. Q. Lu Statistical Inference Based on Divergence Measures , 2007 .

[9]  Deniz Erdogmus,et al.  Convergence properties and data efficiency of the minimum error entropy criterion in ADALINE training , 2003, IEEE Trans. Signal Process..

[10]  L. Glass,et al.  Understanding Nonlinear Dynamics , 1995 .

[11]  Estimators based on sample quantiles using (h, φ)-entropy measures☆ , 1998 .

[12]  Deniz Erdogmus,et al.  Generalized information potential criterion for adaptive system training , 2002, IEEE Trans. Neural Networks.

[13]  J. Gibson,et al.  MVSE adaptive filtering subject to a constraint on MSE , 1988 .

[14]  Deniz Erdoğmuş,et al.  Online entropy manipulation: stochastic information gradient , 2003, IEEE Signal Processing Letters.

[15]  Leandro Pardo,et al.  Asymptotic distribution of (h, φ)-entropies , 1993 .

[16]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[17]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .