Multi-party Sparse Discriminant Learning

Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher's Linear Discriminant Analysis in supervised metric learning, feature selection and classification. With the increasing needs of distributed data collection, storage and processing, enabling the Sparse Discriminant Learning to embrace the Multi-Party distributed computing environments becomes an emerging research topic. This paper proposes a novel Multi-Party SDA algorithm, which can learn SDA models effectively without sharing any raw dataand basic statistics among machines. The proposed algorithm 1) leverages the direct estimation of SDA [1] to derive a distributed loss function for the discriminant learning, 2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and 3) approximates a global estimation of linear discriminant projection vector by optimizing the "distributed bootstrapping loss function" with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the centralized SDA with similar performance, and significantly outperforms the most recent distributed SDA [2] in terms of accuracy and F1-score.

[1]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[2]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[3]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[4]  M. Hashem Pesaran,et al.  Pooled Mean Group Estimation of Dynamic Heterogeneous Panels , 1999 .

[5]  Hao Chen,et al.  Algebraic Geometric Secret Sharing Schemes and Secure Multi-Party Computations over Small Fields , 2006, CRYPTO.

[6]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[7]  Brian Kingsbury,et al.  Efficient one-vs-one kernel ridge regression for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jie Chen,et al.  Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability , 2016, KDD.

[9]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[10]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[11]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[12]  Dan Bogdanov,et al.  High-performance secure multi-party computation for data mining applications , 2012, International Journal of Information Security.

[13]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[14]  Scott Klasky,et al.  Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma , 2015, IEEE Transactions on Big Data.

[15]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[16]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[17]  István Hegedüs,et al.  Gossip learning with linear models on fully distributed data , 2011, Concurr. Comput. Pract. Exp..

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[20]  Lu Tian,et al.  Communication-efficient Distributed Sparse Linear Discriminant Analysis , 2016, AISTATS.