Distributed Sparse Feature Selection in Communication-Restricted Networks

This paper aims to propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. The primary goal is to learn the few causal features of a high-dimensional dataset based on noisy observations from an unknown sparse linear model. However, the presumed training set which includes $n$ data samples in $\mathbb{R}^p$ is already distributed over a large network with $N$ clients connected through extremely low-bandwidth links. Also, we consider the asymptotic configuration of $1\ll N\ll n\ll p$. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network. In this regard, we theoretically show that the true causal features can be reliably recovered with negligible bandwidth usage of $O\left(N\log p\right)$ across the network. This yields a significantly lower communication cost in comparison with the trivial case of transmitting all the samples to a single node (centralized scenario), which requires $O\left(np\right)$ transmissions. Even more sophisticated schemes such as ADMM still have a communication complexity of $O\left(Np\right)$. Surprisingly, our sample complexity bound is proved to be the same (up to a constant factor) as the optimal centralized approach for a fixed performance measure in each node, while that of a na\"{i}ve decentralized technique grows linearly with $N$. Theoretical guarantees in this paper are based on the recent analytic framework of debiased LASSO in Javanmard et al. (2019), and are supported by several computer experiments performed on both synthetic and real-world datasets.

[1]  Verónica Bolón-Canedo,et al.  Centralized vs. distributed feature selection methods based on data complexity measures , 2017, Knowl. Based Syst..

[2]  Francisco Herrera,et al.  BELIEF: A distance-based redundancy-proof feature selection method for Big Data , 2018, Inf. Sci..

[3]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[4]  Enkelejd Hashorva,et al.  On multivariate Gaussian tails , 2003 .

[5]  Vijay Raghunathan,et al.  Communication-efficient View-Pooling for Distributed Multi-View Neural Networks , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[7]  Ross Jacobucci,et al.  Regularized Structural Equation Modeling to Detect Measurement Bias: Evaluation of Lasso, Adaptive Lasso, and Elastic Net , 2020 .

[8]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[9]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[10]  Yonina C. Eldar,et al.  Modified distributed iterative hard thresholding , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Xi Chen,et al.  Distributed High-dimensional Regression Under a Quantile Loss Function , 2019, J. Mach. Learn. Res..

[12]  Mladen Kolar,et al.  Efficient Distributed Learning with Sparsity , 2016, ICML.

[13]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[14]  Adel Javanmard,et al.  False Discovery Rate Control via Debiased Lasso , 2018, Electronic Journal of Statistics.

[15]  Ali H. Sayed,et al.  Sparse Distributed Learning Based on Diffusion Adaptation , 2012, IEEE Transactions on Signal Processing.

[16]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[17]  J. Ramon,et al.  Hoeffding's inequality for sums of weakly dependent random variables , 2015, 1507.06871.

[18]  Hillol Kargupta,et al.  A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks , 2009, Knowledge and Information Systems.

[19]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20]  João M. F. Xavier,et al.  D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization , 2012, IEEE Transactions on Signal Processing.

[21]  Qiang Liu,et al.  Communication-efficient Sparse Regression , 2017, J. Mach. Learn. Res..

[22]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[23]  Yonina C. Eldar,et al.  Distributed sparse signal recovery for sensor networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Fatiha Mrabti,et al.  SDPSO: Spark Distributed PSO-based approach for feature selection and cancer disease prognosis , 2021, J. Big Data.

[25]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[26]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[27]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[28]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[29]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[30]  Yajie Bao,et al.  One-Round Communication Efficient Distributed M-Estimation , 2021, AISTATS.

[31]  Verónica Bolón-Canedo,et al.  Parallel feature selection for distributed-memory clusters , 2019, Inf. Sci..

[32]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[33]  Lei Wang,et al.  Communication-efficient estimation of high-dimensional quantile regression , 2020 .

[34]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[35]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[36]  Chujin Li,et al.  Communication-Efficient Modeling with Penalized Quantile Regression for Distributed Data , 2021, Complex..

[37]  Qiang Li,et al.  Diffusion fused sparse LMS algorithm over networks , 2020, Signal Process..

[38]  Jim Austin,et al.  Hadoop neural network for parallel and distributed feature selection , 2016, Neural Networks.

[39]  Lars K. Rasmussen,et al.  Locally Convex Sparse Learning over Networks , 2018, ArXiv.

[40]  Yu Gui,et al.  ADAGES: Adaptive Aggregation with Stability for Distributed Feature Selection , 2020, FODS.

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[42]  Heng Lian,et al.  Debiasing and Distributed Estimation for High-Dimensional Quantile Regression , 2020, IEEE Transactions on Neural Networks and Learning Systems.