论文信息 - FWDA: a Fast Wishart Discriminant Analysis with its Application to Electronic Health Records Data Classification

FWDA: a Fast Wishart Discriminant Analysis with its Application to Electronic Health Records Data Classification

Linear Discriminant Analysis (LDA) on Electronic Health Records (EHR) data is widely-used for early detection of diseases. Classical LDA for EHR data classification, however, suffers from two handicaps: the ill-posed estimation of LDA parameters (e.g., covariance matrix), and the "linear inseparability" of EHR data. To handle these two issues, in this paper, we propose a novel classifier FWDA -- Fast Wishart Discriminant Analysis, that makes predictions in an ensemble way. Specifically, FWDA first surrogates the distribution of inverse covariance matrices using a Wishart distribution estimated from the training data, then "weighted-averages" the classification results of multiple LDA classifiers parameterized by the sampled inverse covariance matrices via a Bayesian Voting scheme. The weights for voting are optimally updated to adapt each new input data, so as to enable the nonlinear classification. Theoretical analysis indicates that FWDA possesses a fast convergence rate and a robust performance on high dimensional data. Extensive experiments on large-scale EHR dataset show that our approach outperforms state-of-the-art algorithms by a large margin.

Wei Cheng | Haoyi Xiong | Jiang Bian | Wenqing Hu | Zhishan Guo

[1] Matt Simpson,et al. Bayesian inference for a covariance matrix , 2014, 1408.4050.

[2] David R. Anderson,et al. Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[3] Yu Huang,et al. M-SEQ: Early detection of anxiety and depression via temporal orders of diagnoses in electronic health data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[4] Susan Jensen. Mining Medical Data for Predictive and Sequential patterns : PKDD 2001 , .

[5] Adrian E. Raftery,et al. Bayesian Model Averaging: A Tutorial , 2016 .

[6] 秀俊松井,et al. Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[7] François Laviolette,et al. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[8] Philip Rabinowitz,et al. Methods of Numerical Integration , 1985 .

[9] S. Geer,et al. Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[10] R. Tibshirani,et al. Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[11] Bernhard Schölkopf,et al. Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[12] Jieping Ye,et al. Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[13] Tom Leonard,et al. Bayesian Inference for a Covariance Matrix , 1992 .

[14] N Cristianini,et al. 5 : Bayesian Voting Schemes and Large Margin Classifiers , 1999 .

[15] Maya R. Gupta,et al. Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[16] Stephen P. Boyd,et al. Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[17] Honglak Lee,et al. Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[18] Jieping Ye,et al. A two-stage linear discriminant analysis via QR-decomposition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] James C. Turner,et al. College Health Surveillance Network: Epidemiology and Health Care Utilization of College Students at US 4-Year Universities , 2015, Journal of American college health : J of ACH.

[20] Ping Zhang,et al. Clinical Risk Prediction by Exploring High-Order Feature Correlations , 2014, AMIA.

[21] Fei Wang,et al. Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[22] Aleix M. Martínez,et al. Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Zhihua Zhang,et al. Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[24] Adrian E. Raftery,et al. Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[25] Rémi Bardenet,et al. Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[26] John Van Ness,et al. The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Trevor J. Hastie,et al. Sparse Discriminant Analysis , 2011, Technometrics.

[28] Jimeng Sun,et al. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[29] Jieping Ye,et al. An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] L. R. Haff. ESTIMATION OF THE INVERSE COVARIANCE MATRIX: RANDOM MIXTURES OF THE INVERSE WISHART MATRIX AND THE IDENTITY , 1979 .

[31] Victoria J. Fraser,et al. ICD-9 Codes and Surveillance for Clostridium difficile–associated Disease , 2006, Emerging infectious diseases.

[32] Fei Wang,et al. PSF: A Unified Patient Similarity Evaluation Framework Through Metric Learning With Weak Supervision , 2015, IEEE Journal of Biomedical and Health Informatics.

[33] Harrison H. Zhou,et al. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[34] J. Shao,et al. Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[35] Zhihua Zhang,et al. Regularized Discriminant Analysis, Ridge Regression and Beyond , 2010, J. Mach. Learn. Res..

[36] Stanley Sawyer. Wishart Distributions and Inverse-Wishart Sampling , 2007 .

[37] Hui Xiong,et al. Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[38] Konstantinos N. Plataniotis,et al. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[39] Kenney Ng,et al. Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[40] Edward R. Dougherty,et al. Random matrix theory in pattern classification: An application to error estimation , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.