论文信息 - Communication-e � cient Distributed Sparse Linear Discriminant Analysis

Communication-e � cient Distributed Sparse Linear Discriminant Analysis

We propose a communication-e cient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size N into m machines, and estimates a local sparse LDA estimator on each machine using the data subset of size N/m. After the distributed estimation, our method aggregates the debiased local estimators from m machines, and sparsifies the aggregated estimator. We show that the aggregated estimator attains the same statistical rate as the centralized estimation method, as long as the number of machines m is chosen appropriately. Moreover, we prove that our method can attain the model selection consistency under a milder condition than the centralized method. Experiments on both synthetic and real datasets corroborate our theory.

Quanquan Gu | Lu Tian

[1] Han Liu,et al. A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations , 2015, Statistical Science.

[2] Jianqing Fan,et al. Distributed Estimation and Inference with Statistical Guarantees , 2015, 1509.05457.

[3] Le Song,et al. Distributed Kernel Principal Component Analysis , 2015, ArXiv.

[4] Qiang Liu,et al. Communication-efficient sparse regression: a one-shot approach , 2015, ArXiv.

[5] Mladen Kolar,et al. Optimal Feature Selection in High-Dimensional Discriminant Analysis , 2013, IEEE Transactions on Information Theory.

[6] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..

[7] David P. Woodruff,et al. Improved Distributed Principal Component Analysis , 2014, NIPS.

[8] Jonathan D. Rosenblatt,et al. On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.

[9] Adel Javanmard,et al. Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[10] S. Geer,et al. On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[11] MontanariAndrea,et al. Confidence intervals and hypothesis testing for high-dimensional regression , 2014 .

[12] Robert J. Vanderbei,et al. The fastclime package for linear programming and large-scale precision matrix estimation in R , 2014, J. Mach. Learn. Res..

[13] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14] Maria-Florina Balcan,et al. Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[15] H. Zou,et al. A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[16] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[17] Yang Feng,et al. A road to classification in high dimensional space: the regularized optimal affine discriminant , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[19] T. Cai,et al. A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[20] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21] Santiago Zazo,et al. Distributed linear discriminant analysis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] J. Shao,et al. Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[23] T. Cai,et al. A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[24] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[25] Martin J. Wainwright,et al. Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[26] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.

[27] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[28] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[29] P. Bickel,et al. Some theory for Fisher''s linear discriminant function , 2004 .

[30] Anja Vogler,et al. An Introduction to Multivariate Statistical Analysis , 2004 .

[31] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[32] I. Jolliffe. Principal Component Analysis , 2005 .