FWDA: a Fast Wishart Discriminant Analysis with its Application to Electronic Health Records Data Classification

Linear Discriminant Analysis (LDA) on Electronic Health Records (EHR) data is widely-used for early detection of diseases. Classical LDA for EHR data classification, however, suffers from two handicaps: the ill-posed estimation of LDA parameters (e.g., covariance matrix), and the "linear inseparability" of EHR data. To handle these two issues, in this paper, we propose a novel classifier FWDA -- Fast Wishart Discriminant Analysis, that makes predictions in an ensemble way. Specifically, FWDA first surrogates the distribution of inverse covariance matrices using a Wishart distribution estimated from the training data, then "weighted-averages" the classification results of multiple LDA classifiers parameterized by the sampled inverse covariance matrices via a Bayesian Voting scheme. The weights for voting are optimally updated to adapt each new input data, so as to enable the nonlinear classification. Theoretical analysis indicates that FWDA possesses a fast convergence rate and a robust performance on high dimensional data. Extensive experiments on large-scale EHR dataset show that our approach outperforms state-of-the-art algorithms by a large margin.

[1]  Matt Simpson,et al.  Bayesian inference for a covariance matrix , 2014, 1408.4050.

[2]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[3]  Yu Huang,et al.  M-SEQ: Early detection of anxiety and depression via temporal orders of diagnoses in electronic health data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[4]  Susan Jensen Mining Medical Data for Predictive and Sequential patterns : PKDD 2001 , .

[5]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[6]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[7]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[8]  Philip Rabinowitz,et al.  Methods of Numerical Integration , 1985 .

[9]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[10]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[11]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[12]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[13]  Tom Leonard,et al.  Bayesian Inference for a Covariance Matrix , 1992 .

[14]  N Cristianini,et al.  5 : Bayesian Voting Schemes and Large Margin Classifiers , 1999 .

[15]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[16]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[17]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[18]  Jieping Ye,et al.  A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  James C. Turner,et al.  College Health Surveillance Network: Epidemiology and Health Care Utilization of College Students at US 4-Year Universities , 2015, Journal of American college health : J of ACH.

[20]  Ping Zhang,et al.  Clinical Risk Prediction by Exploring High-Order Feature Correlations , 2014, AMIA.

[21]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[22]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[24]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[25]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[26]  John Van Ness,et al.  The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[28]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[29]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  L. R. Haff ESTIMATION OF THE INVERSE COVARIANCE MATRIX: RANDOM MIXTURES OF THE INVERSE WISHART MATRIX AND THE IDENTITY , 1979 .

[31]  Victoria J. Fraser,et al.  ICD-9 Codes and Surveillance for Clostridium difficile–associated Disease , 2006, Emerging infectious diseases.

[32]  Fei Wang,et al.  PSF: A Unified Patient Similarity Evaluation Framework Through Metric Learning With Weak Supervision , 2015, IEEE Journal of Biomedical and Health Informatics.

[33]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[34]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[35]  Zhihua Zhang,et al.  Regularized Discriminant Analysis, Ridge Regression and Beyond , 2010, J. Mach. Learn. Res..

[36]  Stanley Sawyer Wishart Distributions and Inverse-Wishart Sampling , 2007 .

[37]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[38]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[39]  Kenney Ng,et al.  Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[40]  Edward R. Dougherty,et al.  Random matrix theory in pattern classification: An application to error estimation , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.