Risk Analysis of Divide-and-Conquer ERM

Theoretical analysis of the divide-and-conquer based distributed learning with least square loss in the reproducing kernel Hilbert space (RKHS) have recently been explored within the framework of learning theory. However, the studies on learning theory for general loss functions and hypothesis spaces remain limited. To fill the gap, we study the risk performance of distributed empirical risk minimization (ERM) for general loss functions and hypothesis spaces. The main contributions are two-fold. First, we derive two risk bounds of optimal rates under certain basic assumptions on the hypothesis space, as well as the smoothness, Lipschitz continuity, strong convexity of the loss function. Second, we further develop two more general risk bounds for distributed ERM without the restriction of strong convexity.

[1]  Ding-Xuan Zhou,et al.  Learning theory of distributed spectral algorithms , 2017 .

[2]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[3]  Gilles Blanchard,et al.  Parallelizing Spectrally Regularized Kernel Algorithms , 2018, J. Mach. Learn. Res..

[4]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[5]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[7]  Rong Jin,et al.  Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds , 2017, COLT.

[8]  Ohad Shamir,et al.  Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[10]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[11]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[12]  Xindong Wu,et al.  On the Feasibility of Distributed Kernel Regression for Big Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[13]  Steve Hanneke,et al.  Refined Error Bounds for Several Learning Algorithms , 2015, J. Mach. Learn. Res..

[14]  Shizhong Liao,et al.  Model Selection with the Covering Number of the Ball of RKHS , 2014, CIKM.

[15]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[16]  Ding-Xuan Zhou,et al.  Distributed Kernel-Based Gradient Descent Algorithms , 2018 .

[17]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[18]  Steve Hanneke,et al.  Localization of VC Classes: Beyond Local Rademacher Complexities , 2016, ALT.

[19]  Graham J. Williams,et al.  Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum] , 2014, IEEE Computational Intelligence Magazine.

[20]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Dao-Hong Xiang,et al.  Thresholded spectral algorithms for sparse approximations , 2017 .

[23]  Ding-Xuan Zhou,et al.  Distributed Learning with Regularized Least Squares , 2016, J. Mach. Learn. Res..

[24]  Martin J. Wainwright,et al.  Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[25]  Ting Hu,et al.  Distributed kernel gradient descent algorithm for minimum error entropy principle , 2020 .

[26]  Hans Triebel,et al.  Inequalities between eigenvalues, entropy numbers, and related quantities of compact operators in Banach spaces , 1980 .

[27]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[28]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[29]  Chong Gu Smoothing Spline Anova Models , 2002 .

[30]  Runze Li,et al.  Statistical inference in massive data sets , 2012 .

[31]  Xiangyu Chang,et al.  Distributed Semi-supervised Learning with Kernel Ridge Regression , 2017, J. Mach. Learn. Res..

[32]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[33]  Yao Wang,et al.  Divide and Conquer Local Average Regression , 2016, ArXiv.