Distributed kernel gradient descent algorithm for minimum error entropy principle

Abstract Distributed learning based on the divide and conquer approach is a powerful tool for big data processing. We introduce a distributed kernel gradient descent algorithm for the minimum error entropy principle and analyze its convergence. We show that the L 2 error decays at a minimax optimal rate under some mild conditions. As a tool we establish some concentration inequalities for U-statistics which play pivotal roles in our error analysis.

[1]  Jun Fan,et al.  Learning theory approach to minimum error entropy criterion , 2012, J. Mach. Learn. Res..

[2]  Ting Hu,et al.  Convergence of Gradient Descent for Minimum Error Entropy Principle in Linear Regression , 2016, IEEE Transactions on Signal Processing.

[3]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[4]  Deniz Erdoğmuş,et al.  COMPARISON OF ENTROPY AND MEAN SQUARE ERROR CRITERIA IN ADAPTIVE SYSTEM TRAINING USING HIGHER ORDER STATISTICS , 2004 .

[5]  Ting Hu,et al.  Kernel gradient descent algorithm for information theoretic learning , 2021, J. Approx. Theory.

[6]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[7]  Luís A. Alexandre,et al.  The MEE Principle in Data Classification: A Perceptron-Based Analysis , 2010, Neural Computation.

[8]  Ding-Xuan Zhou,et al.  Distributed Kernel-Based Gradient Descent Algorithms , 2018 .

[9]  Lei Shi,et al.  Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network , 2017, J. Mach. Learn. Res..

[10]  Badong Chen,et al.  Survival Information Potential: A New Criterion for Adaptive System Training , 2012, IEEE Transactions on Signal Processing.

[11]  Yiming Ying,et al.  Online Pairwise Learning Algorithms , 2016, Neural Computation.

[12]  Deniz Erdogmus,et al.  Blind source separation using Renyi's -marginal entropies , 2002, Neurocomputing.

[13]  Yann LeCun,et al.  Deep learning with Elastic Averaging SGD , 2014, NIPS.

[14]  Luís A. Alexandre,et al.  Neural network classification using Shannon's entropy , 2005, ESANN.

[15]  Zongze Wu,et al.  Minimum Error Entropy Algorithms with Sparsity Penalty Constraints , 2015, Entropy.

[16]  Panos M. Pardalos,et al.  Invexity of the Minimum Error Entropy Criterion , 2013, IEEE Signal Processing Letters.

[17]  Deniz Erdogmus,et al.  Convergence properties and data efficiency of the minimum error entropy criterion in ADALINE training , 2003, IEEE Trans. Signal Process..

[18]  Yiming Ying,et al.  Unregularized Online Learning Algorithms with General Loss Functions , 2015, ArXiv.

[19]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[20]  Stefano Soatto,et al.  Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.

[21]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[22]  Zengqi Sun,et al.  Stochastic Gradient Algorithm Under (h,φ)-Entropy Criterion , 2007 .

[23]  Badong Chen,et al.  Mean-Square Convergence Analysis of ADALINE Training With Minimum Error Entropy Criterion , 2010, IEEE Transactions on Neural Networks.

[24]  Chunguang Li,et al.  Minimum Total Error Entropy Method for Parameter Estimation , 2015, IEEE Transactions on Signal Processing.

[25]  Jun Fan,et al.  Consistency Analysis of an Empirical Minimum Error Entropy Algorithm , 2014, ArXiv.

[26]  Ding-Xuan Zhou,et al.  Distributed Learning with Regularized Least Squares , 2016, J. Mach. Learn. Res..

[27]  Ding-Xuan Zhou,et al.  Regularization schemes for minimum error entropy principle , 2015 .

[28]  Ding-Xuan Zhou,et al.  Learning theory of distributed spectral algorithms , 2017 .

[29]  Dao-Hong Xiang,et al.  Thresholded spectral algorithms for sparse approximations , 2017 .

[30]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jonathan D. Rosenblatt,et al.  On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.

[32]  Jun Fan,et al.  A Statistical Learning Approach to Modal Regression , 2017, J. Mach. Learn. Res..

[33]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..