Computational Limits of A Distributed Algorithm for Smoothing Spline

In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. These sharp bounds partly capture intrinsic computational limits of the distributed algorithm considered in this paper, and turn out to be fully determined by the smoothness of the regression function. As a side remark, we argue that sample splitting may be viewed as an alternative form of regularization, playing a similar role as smoothing parameter.

[1]  Guang Cheng,et al.  Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data , 2018, ICML.

[2]  Martin J. Wainwright,et al.  Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[3]  Christopher K. I. Williams,et al.  Understanding Gaussian Process Regression Using the Equivalent Kernel , 2004, Deterministic and Statistical Methods in Machine Learning.

[4]  Guang Cheng,et al.  How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression , 2018, 1805.09948.

[5]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[6]  Han Liu,et al.  A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[7]  Guang Cheng,et al.  Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.

[8]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[9]  Vincent N. LaRiccia,et al.  Maximum Penalized Likelihood Estimation: Volume II Regression , 2011 .

[10]  Guang Cheng,et al.  A Bayesian Splitotic Theory For Nonparametric Models , 2015 .

[11]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[12]  M. Yuan,et al.  Optimal estimation of the mean function based on discretely sampled functional data: Phase transition , 2011, 1202.5134.

[13]  G. Wahba Spline models for observational data , 1990 .

[14]  Yun Yang,et al.  Non-asymptotic theory for nonparametric testing , 2017, 1702.01330.

[15]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[16]  Peter F. de Jong,et al.  A central limit theorem for generalized quadratic forms , 1987 .

[17]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .