A distributed one-step estimator

Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator, e.g., Zhang and Duchi (J Mach Learn Res 14:3321–3363, 2013) together with many others. We propose a one-step approach to enhance a simple-averaging based distributed estimator by utilizing a single Newton–Raphson updating. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the idealized centralized estimator. In particular, the asymptotic normality was established for the proposed estimator, while other competitors may not enjoy the same property. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. The proposed one-step approach leads to a lower upper bound of the mean squared error than other alternatives. In finite sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the sample mean squared error. A potential application of the one-step approach is that one can use multiple machines to speed up large scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth.

[1]  David P. Woodruff,et al.  Improved Distributed Principal Component Analysis , 2014, NIPS.

[2]  F. Liang,et al.  A split‐and‐merge Bayesian variable selection approach for ultrahigh dimensional regression , 2015 .

[3]  M. Wainwright Constrained forms of statistical minimax : 1 Computation , communication , and privacy , 2014 .

[4]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[5]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[6]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[7]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[8]  Jonathan D. Rosenblatt,et al.  On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.

[9]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[10]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[11]  Jianqing Fan,et al.  Distributed Estimation and Inference with Statistical Guarantees , 2015, 1509.05457.

[12]  Han Liu,et al.  A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. , 2014, Annals of statistics.

[13]  Lifeng Lai,et al.  Are Slepian-Wolf rates necessary for distributed parameter estimation? , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[15]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[16]  P. Bickel One-Step Huber Estimates in the Linear Model , 1975 .

[17]  S. Lang Real and Functional Analysis , 1983 .

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..

[20]  Qiang Liu,et al.  Distributed Estimation, Information Loss and Exponential Families , 2014, NIPS.

[21]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[22]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[25]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[26]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[27]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[28]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[29]  Qiang Liu,et al.  Communication-efficient sparse regression: a one-shot approach , 2015, ArXiv.

[30]  Niklas Carlsson,et al.  Characterizing web-based video sharing workloads , 2009, WWW '09.

[31]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[32]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[33]  H. Rosenthal On the subspaces ofLp(p>2) spanned by sequences of independent random variables , 1970 .

[34]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[35]  Jianqing Fan,et al.  One-step local quasi-likelihood estimation , 1999 .

[36]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37]  Xiangyu Wang,et al.  Median Selection Subset Aggregation for Parallel Inference , 2014, NIPS.