A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data

In this paper, we consider a scale adjusted-type distance-based classifier for high-dimensional data. We first give such a classifier that can ensure high accuracy in misclassification rates for two-class classification. We show that the classifier is not only consistent but also asymptotically normal for high-dimensional data. We provide sample size determination so that misclassification rates are no more than a prespecified value. We propose a classification procedure called the misclassification rate adjusted classifier. We further develop the classifier to multiclass classification. We show that the classifier can still enjoy asymptotic properties and ensure high accuracy in misclassification rates for multiclass classification. Finally, we demonstrate the proposed classifier in actual data analyses by using a microarray data set.

[1]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[2]  D. McLeish Dependent Central Limit Theorems and Invariance Principles , 1974 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  矢田 和善,et al.  Asymptotic properties of a distance-based classifier for high-dimensional data (統計的モデルの新たな展望とそれに関連する話題 : RIMS共同研究報告集) , 2012 .

[5]  M. Srivastava Some Tests Concerning the Covariance Matrix in High Dimensional Data , 2005 .

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  P. Sen,et al.  Sequential Estimation: Ghosh/Sequential , 1997 .

[8]  Makoto Aoshima,et al.  Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations , 2012, J. Multivar. Anal..

[9]  Makoto Aoshima,et al.  PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context , 2009 .

[10]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[11]  Peter Hall,et al.  Scale adjustments for classifiers in high-dimensional, low sample size settings , 2009 .

[12]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[13]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[14]  Malay Ghosh,et al.  Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes , 2008 .

[15]  I. Ibragimov,et al.  On Sequential Estimation , 1975 .

[16]  Makoto Aoshima,et al.  Correlation tests for high-dimensional data using extended cross-data-matrix methodology , 2013, J. Multivar. Anal..

[17]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[18]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[19]  Makoto Aoshima,et al.  Two-Stage Procedures for High-Dimensional Data , 2011 .

[20]  H. Saranadasa Asymptotic Expansion of the Misclassification Probabilities of D- and A-Criteria for Discrimination from Two High Dimensional Populations Using the Theory of Large Dimensional Random Matrices , 1993 .

[21]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[22]  Tiejun Tong,et al.  Bias‐Corrected Diagonal Discriminant Rules for High‐Dimensional Classification , 2010, Biometrics.

[23]  Makoto Aoshima,et al.  Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix , 2010, J. Multivar. Anal..

[24]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[25]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[27]  James Stephen Marron,et al.  Distance‐weighted discrimination , 2015 .

[28]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .