论文信息 - How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression - 字舞流文

How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression

This paper aims to solve a basic problem in distributed statistical inference: how many machines can we use in parallel computing? In kernel ridge regression, we address this question in two important settings: nonparametric estimation and hypothesis testing. Specifically, we find a range for the number of machines under which optimal estimation/testing is achievable. The employed empirical processes method provides a unified framework, that allows us to handle various regression problems (such as thin-plate splines and nonparametric additive regression) under different settings (such as univariate, multivariate and diverging-dimensional designs). It is worth noting that the upper bounds of the number of machines are proven to be un-improvable (upto a logarithmic factor) in two important cases: smoothing spline regression and Gaussian RKHS regression. Our theoretical findings are backed by thorough numerical studies.

Guang Cheng | Meimei Liu | Zuofeng Shang | Guang Cheng | Meimei Liu | Zuofeng Shang

[1] S. Geer,et al. High-dimensional additive modeling , 2008, 0806.4115.

[2] Martin J. Wainwright,et al. The Local Geometry of Testing in Ellipses: Tight Control via Localized Kolmogorov Widths , 2017, IEEE Transactions on Information Theory.

[3] Christopher K. I. Williams,et al. Understanding Gaussian Process Regression Using the Equivalent Kernel , 2004, Deterministic and Statistical Methods in Machine Learning.

[4] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[5] Ming Yuan,et al. Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition , 2015, ArXiv.

[6] Jianqing Fan,et al. Nonparametric inference with generalized likelihood ratio tests , 2007 .

[7] Martin J. Wainwright,et al. Divide and Conquer Kernel Ridge Regression , 2013, COLT.

[8] Han Liu,et al. Nonparametric Heterogeneity Testing For Massive Data , 2016, 1601.06212.

[9] M. Kosorok. Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[10] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[11] Guang Cheng,et al. Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data , 2018, ICML.

[12] J. Duchon. Spline minimizing rotation-invariant seminorms in Sobolev spaces , 1977 .

[13] B. Silverman,et al. Maximum Penalized Likelihood Estimation , 2006 .

[14] Shahar Mendelson,et al. Geometric Parameters of Kernel Machines , 2002, COLT.

[15] Jean Duchon,et al. Splines minimizing rotation-invariant semi-norms in Sobolev spaces , 1976, Constructive Theory of Functions of Several Variables.

[16] Martin J. Wainwright,et al. Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[17] Aki Vehtari,et al. BAYESIAN AGGREGATION OF AVERAGE DATA By , 2017 .

[18] David B. Dunson,et al. Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[19] Han Liu,et al. A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[20] Guang Cheng,et al. Computational Limits of Divide-and-Conquer Method , 2015 .

[21] Peter F. de Jong,et al. A central limit theorem for generalized quadratic forms , 1987 .

[22] Jianqing Fan,et al. Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[23] Nate Strawn,et al. Distributed Statistical Estimation and Rates of Convergence in Normal Approximation , 2017, Electronic Journal of Statistics.

[24] Guang Cheng,et al. Computational Limits of A Distributed Algorithm for Smoothing Spline , 2015, J. Mach. Learn. Res..

[25] C. J. Stone,et al. Additive Regression and Other Nonparametric Models , 1985 .

[26] G. Wahba,et al. Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation , 1980 .

[27] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.

[28] Guang Cheng,et al. A Bayesian Splitotic Theory For Nonparametric Models , 2015 .

[29] Yun Yang,et al. Non-asymptotic theory for nonparametric testing , 2017, 1702.01330.

[30] Guang Cheng,et al. Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.

[31] Martin J. Wainwright,et al. Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[32] Harry van Zanten,et al. An asymptotic analysis of distributed nonparametric methods , 2017, J. Mach. Learn. Res..

[33] Chong Gu. Smoothing Spline Anova Models , 2002 .

[34] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .