MP2SDA: Multi-Party Parallelized Sparse Discriminant Learning

Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher’s Linear Discriminant Analysis in supervised metric learning, feature selection, and classification. With the increasing needs of distributed data collection, storage, and processing, enabling the Sparse Discriminant Learning to embrace the multi-party distributed computing environments becomes an emerging research topic. This article proposes a novel multi-party SDA algorithm, which can learn SDA models effectively without sharing any raw data and basic statistics among machines. The proposed algorithm (1) leverages the direct estimation of SDA to derive a distributed loss function for the discriminant learning, (2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and (3) approximates a global estimation of linear discriminant projection vector by optimizing the “distributed bootstrapping loss function” with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the aggregated SDA with similar performance, and significantly outperforms the most recent distributed SDA in terms of accuracy and F1-score.

[1]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[2]  P W DuinRobert,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998 .

[3]  Dan Bogdanov,et al.  High-performance secure multi-party computation for data mining applications , 2012, International Journal of Information Security.

[4]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[5]  E. Ziegel,et al.  Bootstrapping: A Nonparametric Approach to Statistical Inference , 1993 .

[6]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[7]  Haoyi Xiong,et al.  $\mathcal{DBSDA}$ : Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Per A. Mykland,et al.  Asymptotic Expansions and Bootstrapping Distributions for Dependent Variables: A Martingale Approach , 1992 .

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  Pengtao Xie,et al.  Strategies and Principles of Distributed Machine Learning on Big Data , 2015, ArXiv.

[11]  Charu C. Aggarwal,et al.  State-Driven Dynamic Sensor Selection and Prediction with State-Stacked Sparseness , 2015, KDD.

[12]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[13]  Heinz W. Engl,et al.  Inverse and Ill-Posed Problems , 1987 .

[14]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[15]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[16]  Lu Tian,et al.  Communication-efficient Distributed Sparse Linear Discriminant Analysis , 2016, AISTATS.

[17]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[18]  MontanariAndrea,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2014 .

[19]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[20]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[21]  Wei Cheng,et al.  De-biasing Covariance-Regularized Discriminant Analysis , 2018, IJCAI.

[22]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[23]  Hanjiang Lai,et al.  Personalized Age Progression with Bi-Level Aging Dictionary Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[25]  Zhenyu He,et al.  Unified Sparse Subspace Learning via Self-Contained Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  M. Hashem Pesaran,et al.  Pooled Mean Group Estimation of Dynamic Heterogeneous Panels , 1999 .

[27]  Ran Wolff,et al.  Distributed Decision‐Tree Induction in Peer‐to‐Peer Systems , 2008, Stat. Anal. Data Min..

[28]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[29]  Yanjie Fu,et al.  CSWA: Aggregation-Free Spatial-Temporal Community Sensing , 2018, AAAI.

[30]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[31]  Yiu-ming Cheung,et al.  Efficient Generalized Conditional Gradient with Gradient Sliding for Composite Optimization , 2015, IJCAI.

[32]  Suyog Gupta,et al.  Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[33]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.

[34]  Haoyi Xiong,et al.  SecureGBM: Secure Multi-Party Gradient Boosting , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[35]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[36]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[37]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.

[38]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[39]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[40]  Lunke Fei,et al.  Robust Sparse Linear Discriminant Analysis , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Shuicheng Yan,et al.  Image Classification With Tailored Fine-Grained Dictionaries , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[43]  István Hegedüs,et al.  Gossip learning with linear models on fully distributed data , 2011, Concurr. Comput. Pract. Exp..

[44]  Wei Cheng,et al.  Multi-party Sparse Discriminant Learning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[45]  Haoyi Xiong,et al.  Early detection of diseases using electronic health records data and covariance-regularized linear discriminant analysis , 2017, 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[46]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[47]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[48]  Kai Ming Ting,et al.  Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[49]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[50]  Tom Goldstein,et al.  Efficient Distributed SGD with Variance Reduction , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[51]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Applying One-Sided Selection to Unbalanced Datasets , 2000, MICAI.

[52]  Hao Chen,et al.  Algebraic Geometric Secret Sharing Schemes and Secure Multi-Party Computations over Small Fields , 2006, CRYPTO.

[53]  Jing Yang,et al.  A parallel SVM training algorithm on large-scale classification problems , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[54]  Yang Yu,et al.  Scaling Simultaneous Optimistic Optimization for High-Dimensional Non-Convex Functions with Low Effective Dimensions , 2016, AAAI.

[55]  Wei Cheng,et al.  AWDA: An Adaptive Wishart Discriminant Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[56]  S. Utev Central limit theorem for dependent random variables , 1990 .

[57]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).