Distributed Sparse Linear Regression under Communication Constraints

: In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and thus a tight communication budget. In this work we focus on distributed learning of a sparse linear regression model, under severe communication constraints. We propose several two round distributed schemes, whose communication per machine is sublinear in the data dimension. In our schemes, individual machines compute debiased lasso estimators, but send to the fusion center only very few values. On the theoretical front, we analyze one of these schemes and prove that with high probability it achieves exact support recovery at low signal to noise ratios, where individual machines fail to recover the support. We show in simulations that our scheme works as well as, and in some cases better, than more communication intensive approaches.

[1]  Seyed Abolfazl Motahari,et al.  Distributed Sparse Feature Selection in Communication-Restricted Networks , 2021, ArXiv.

[2]  Weidong Liu,et al.  A review of distributed statistical inference , 2021, Statistical Theory and Related Fields.

[3]  B. Nadler,et al.  Distributed Sparse Normal Means Estimation with Sublinear Communication , 2021, 2102.03060.

[4]  Runze Li,et al.  Statistical Foundations of Data Science , 2020 .

[5]  Yin Xia,et al.  Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints , 2020, J. Mach. Learn. Res..

[6]  Hansheng Wang,et al.  Least-Square Approximation for a Distributed System , 2019, J. Comput. Graph. Stat..

[7]  Xi Chen,et al.  Distributed High-dimensional Regression Under a Quantile Loss Function , 2019, J. Mach. Learn. Res..

[8]  Yueqi Sheng,et al.  One-shot distributed ridge regression in high dimensions , 2019, ICML.

[9]  Christopher De Sa,et al.  Distributed Learning with Sublinear Communication , 2019, ICML.

[10]  E. Dobriban,et al.  Distributed linear regression by averaging , 2018, The Annals of Statistics.

[11]  Miles E. Lopes,et al.  A sharp lower-tail bound for Gaussian maxima with application to bootstrap methods in high dimensions , 2018, Electronic Journal of Statistics.

[12]  X. Huo,et al.  Aggregated inference , 2018, WIREs Computational Statistics.

[13]  Jianqing Fan,et al.  DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS. , 2018, Annals of statistics.

[14]  H. Lian,et al.  Debiased Distributed Learning for Sparse Partial Linear Models in High Dimensions , 2017, J. Mach. Learn. Res..

[15]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[16]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[17]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[18]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[19]  Jonathan D. Rosenblatt,et al.  On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.

[20]  Brian McWilliams,et al.  LOCO: Distributing Ridge Regression with Random Projections , 2014, 1406.3469.

[21]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[22]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013 .

[23]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[24]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[25]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[26]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[27]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[30]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[31]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[32]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[33]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[34]  H. Vincent Poor,et al.  Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[35]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[36]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[37]  Qiang Liu,et al.  Communication-efficient Sparse Regression , 2017, J. Mach. Learn. Res..

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .