A Direct Approach for Sparse Quadratic Discriminant Analysis

Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named QUDA for QDA in analyzing high-dimensional data. Formulated in a simple and coherent framework, QUDA aims to directly estimate the key quantities in the Bayes discriminant function including quadratic interactions and a linear index of the variables for classification. Under appropriate sparsity assumptions, we establish consistency results for estimating the interactions and the linear index, and further demonstrate that the misclassification rate of our procedure converges to the optimal Bayes risk, even when the dimensionality is exponentially high with respect to the sample size. An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed for finding interactions, which is much faster than its competitor in the literature. The promising performance of QUDA is illustrated via extensive simulation studies and the analysis of two datasets.

[1]  Yang Feng,et al.  A road to classification in high dimensional space: the regularized optimal affine discriminant , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[2]  Jianqing Fan,et al.  QUADRO: A SUPERVISED DIMENSION REDUCTION METHOD VIA RAYLEIGH QUOTIENT OPTIMIZATION. , 2013, Annals of statistics.

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[5]  Ning Hao,et al.  Interaction Screening for Ultrahigh-Dimensional Data , 2014, Journal of the American Statistical Association.

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  Kim-Chuan Toh,et al.  A note on the convergence of ADMM for linearly constrained convex optimization problems , 2015, Computational Optimization and Applications.

[8]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[9]  Jun Shao,et al.  SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA , 2015 .

[10]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[11]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[12]  Chenlei Leng,et al.  Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data , 2008, Comput. Biol. Chem..

[13]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[14]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[15]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[16]  Hongyu Zhao,et al.  The application of sparse estimation of covariance matrix to quadratic discriminant analysis , 2015, BMC Bioinformatics.

[17]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[18]  H. Zou,et al.  Sparse precision matrix estimation via lasso penalized D-trace loss , 2014 .

[19]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[20]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[21]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[22]  Runze Li,et al.  Ultrahigh-Dimensional Multiclass Linear Discriminant Analysis by Pairwise Sure Independence Screening , 2016, Journal of the American Statistical Association.

[23]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[24]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[25]  C. Anderson‐Cook,et al.  An Introduction to Multivariate Statistical Analysis (3rd ed.) (Book) , 2004 .

[26]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[27]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[28]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[29]  Yingying Fan,et al.  INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION , 2015 .

[30]  Euhanna Ghadimi,et al.  Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems , 2013, IEEE Transactions on Automatic Control.

[31]  Ning Hao,et al.  Interaction Screening for Ultra-High Dimensional Data. , 2014, Journal of the American Statistical Association.

[32]  Ning Hao,et al.  A Note on High-Dimensional Linear Regression With Interactions , 2014, 1412.7138.

[33]  Daniel Pizarro-Perez,et al.  Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy , 2016, IEEE Transactions on Medical Imaging.

[34]  Yi Yang,et al.  Multiclass Sparse Discriminant Analysis , 2015, 1504.05845.

[35]  T. Cai,et al.  Direct estimation of differential networks. , 2014, Biometrika.

[36]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[37]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[38]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[39]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[40]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.