How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression

This paper aims to solve a basic problem in distributed statistical inference: how many machines can we use in parallel computing? In kernel ridge regression, we address this question in two important settings: nonparametric estimation and hypothesis testing. Specifically, we find a range for the number of machines under which optimal estimation/testing is achievable. The employed empirical processes method provides a unified framework, that allows us to handle various regression problems (such as thin-plate splines and nonparametric additive regression) under different settings (such as univariate, multivariate and diverging-dimensional designs). It is worth noting that the upper bounds of the number of machines are proven to be un-improvable (upto a logarithmic factor) in two important cases: smoothing spline regression and Gaussian RKHS regression. Our theoretical findings are backed by thorough numerical studies.

[1]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[2]  Martin J. Wainwright,et al.  The Local Geometry of Testing in Ellipses: Tight Control via Localized Kolmogorov Widths , 2017, IEEE Transactions on Information Theory.

[3]  Christopher K. I. Williams,et al.  Understanding Gaussian Process Regression Using the Equivalent Kernel , 2004, Deterministic and Statistical Methods in Machine Learning.

[4]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[5]  Ming Yuan,et al.  Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition , 2015, ArXiv.

[6]  Jianqing Fan,et al.  Nonparametric inference with generalized likelihood ratio tests , 2007 .

[7]  Martin J. Wainwright,et al.  Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[8]  Han Liu,et al.  Nonparametric Heterogeneity Testing For Massive Data , 2016, 1601.06212.

[9]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[10]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[11]  Guang Cheng,et al.  Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data , 2018, ICML.

[12]  J. Duchon Spline minimizing rotation-invariant seminorms in Sobolev spaces , 1977 .

[13]  B. Silverman,et al.  Maximum Penalized Likelihood Estimation , 2006 .

[14]  Shahar Mendelson,et al.  Geometric Parameters of Kernel Machines , 2002, COLT.

[15]  Jean Duchon,et al.  Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[16]  Martin J. Wainwright,et al.  Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[17]  Aki Vehtari,et al.  BAYESIAN AGGREGATION OF AVERAGE DATA By , 2017 .

[18]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[19]  Han Liu,et al.  A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[20]  Guang Cheng,et al.  Computational Limits of Divide-and-Conquer Method , 2015 .

[21]  Peter F. de Jong,et al.  A central limit theorem for generalized quadratic forms , 1987 .

[22]  Jianqing Fan,et al.  Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[23]  Nate Strawn,et al.  Distributed Statistical Estimation and Rates of Convergence in Normal Approximation , 2017, Electronic Journal of Statistics.

[24]  Guang Cheng,et al.  Computational Limits of A Distributed Algorithm for Smoothing Spline , 2015, J. Mach. Learn. Res..

[25]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[26]  G. Wahba,et al.  Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation , 1980 .

[27]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[28]  Guang Cheng,et al.  A Bayesian Splitotic Theory For Nonparametric Models , 2015 .

[29]  Yun Yang,et al.  Non-asymptotic theory for nonparametric testing , 2017, 1702.01330.

[30]  Guang Cheng,et al.  Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.

[31]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[32]  Harry van Zanten,et al.  An asymptotic analysis of distributed nonparametric methods , 2017, J. Mach. Learn. Res..

[33]  Chong Gu Smoothing Spline Anova Models , 2002 .

[34]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .