Weighted Distance Weighted Discrimination and Its Asymptotic Properties

While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the development of optimal weighting schemes for various nonstandard classification problems. In addition, we discuss several alternative criteria and propose an adaptive weighting scheme (awDWD) and demonstrate its advantages over nonadaptive weighting schemes under some situations. The second major contribution is a theoretical study of weighted DWD. Both high-dimensional low sample-size asymptotics and Fisher consistency of DWD are studied. The performance of weighted DWD is evaluated using simulated examples and two real data examples. The theoretical results are also confirmed by simulations.

[1]  M. Silvapulle,et al.  Ridge estimation in logistic regression , 1988 .

[2]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[5]  Douglas G. Simpson,et al.  Correlation and high-dimensional consistency in pattern recognition , 1998 .

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[8]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[11]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[15]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[16]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[17]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[18]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[19]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[20]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[21]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[22]  LinChih-Jen,et al.  A tutorial on -support vector machines , 2005 .

[23]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[24]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[25]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[26]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[27]  A. Nobel,et al.  Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data , 2008 .

[28]  Hao Helen Zhang,et al.  Asymptotic Properties of Distance-Weighted Discrimination , 2008 .

[29]  Yufeng Liu,et al.  Adaptive Weighted Learning for Unbalanced Multicategory Classification , 2009, Biometrics.

[30]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[31]  J. Marron,et al.  The maximal data piling direction for discrimination , 2010 .

[32]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.