AWDA: An Adaptive Wishart Discriminant Analysis

Linear Discriminant Analysis (LDA) is widely-used for supervised dimension reduction and linear classification. Classical LDA, however, suffers from the ill-posed estimation problem on data with high dimension and low sample size (HDLSS). To cope with this problem, in this paper, we propose an Adaptive Wishart Discriminant Analysis (AWDA) for classification, that makes predictions in an ensemble way. Comparing to existing approaches, AWDA has two advantages: 1) leveraging theWishart distribution, AWDA ensembles multiple LDA classifiers parameterized by the sampled covariance matrices via a Bayesian Voting Scheme, which theoretically improves the robustness of classification, compared to LDA classifiers using a single (probably ill-posed) covariance matrix estimator; 2) AWDA updates the weights for voting optimally to adapt the local information of each new input data, so as to enable the nonlinear classification. Theoretical analysis indicates that AWDA guarantees a close approximation to the optimal Bayesian inference and thus achieves robust performance on high dimensional data. Extensive experiments on real-world datasets show that our approach outperforms state-of-the-art algorithms by a large margin.

[1]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[2]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[3]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[4]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  Zhihua Zhang,et al.  Regularized Discriminant Analysis, Ridge Regression and Beyond , 2010, J. Mach. Learn. Res..

[7]  Refik Soyer,et al.  Bayesian Methods for Nonlinear Classification and Regression , 2004, Technometrics.

[8]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[10]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[11]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  L Tierney,et al.  Some adaptive monte carlo methods for Bayesian inference. , 1999, Statistics in medicine.

[13]  Hui Gao,et al.  Why direct LDA is not equivalent to LDA , 2006, Pattern Recognit..

[14]  Matt Simpson,et al.  Bayesian inference for a covariance matrix , 2014, 1408.4050.

[15]  Umut A. Acar,et al.  Adaptive Bayesian inference , 2007, NIPS 2007.

[16]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[17]  John Van Ness,et al.  The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  L. R. Haff ESTIMATION OF THE INVERSE COVARIANCE MATRIX: RANDOM MIXTURES OF THE INVERSE WISHART MATRIX AND THE IDENTITY , 1979 .

[19]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[20]  Fei Wang,et al.  PSF: A Unified Patient Similarity Evaluation Framework Through Metric Learning With Weak Supervision , 2015, IEEE Journal of Biomedical and Health Informatics.

[21]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[22]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[23]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[24]  Tom Leonard,et al.  Bayesian Inference for a Covariance Matrix , 1992 .

[25]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[26]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[27]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  James C. Turner,et al.  College Health Surveillance Network: Epidemiology and Health Care Utilization of College Students at US 4-Year Universities , 2015, Journal of American college health : J of ACH.

[29]  Edward R. Dougherty,et al.  Random matrix theory in pattern classification: An application to error estimation , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[30]  N Cristianini,et al.  5 : Bayesian Voting Schemes and Large Margin Classifiers , 1999 .

[31]  Zhihua Qiao,et al.  Efiective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data , 2007 .

[32]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[33]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[34]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[35]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[37]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[38]  Konstantinos N. Plataniotis,et al.  Face recognition using LDA-based algorithms , 2003, IEEE Trans. Neural Networks.

[39]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[40]  Philip Rabinowitz,et al.  Methods of Numerical Integration , 1985 .

[41]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[42]  Peter Filzmoser,et al.  CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS , 2008 .

[43]  Stanley Sawyer Wishart Distributions and Inverse-Wishart Sampling , 2007 .