Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option

Data extracted from microarrays are now considered an important source of knowledge about various diseases. Several studies based on microarray data and the use of receiver operating characteristics (ROC) graphs have compared supervised machine learning approaches. These comparisons are based on classification schemes in which all samples are classified, regardless of the degree of confidence associated with the classification of a particular sample on the basis of a given classifier. In the domain of healthcare, it is safer to refrain from classifying a sample if the confidence assigned to the classification is not high enough, rather than classifying all samples even if confidence is low. We describe an approach in which the performance of different classifiers is compared, with the possibility of rejection, based on several reject areas. Using a tradeoff between accuracy and rejection, we propose the use of accuracy-rejection curves (ARCs) and three types of relationship between ARCs for comparisons of the ARCs of two classifiers. Empirical results based on purely synthetic data, semi-synthetic data (generated from real data obtained from patients) and public microarray data for binary classification problems demonstrate the efficacy of this method.

[1]  Blaise Hanczar,et al.  Decorrelation of the True and Estimated Classifier Errors in High-Dimensional Settings , 2007, EURASIP J. Bioinform. Syst. Biol..

[2]  Jean-Daniel Zucker,et al.  Aggregating Abstaining and Delegating Classifiers For Improving Classification performance : An application to lung cancer survival prediction , 2007 .

[3]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[4]  Kjell Johnson,et al.  Evaluating Methods for Classifying Expression Data , 2004, Journal of biopharmaceutical statistics.

[5]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[6]  Ivan Flores,et al.  An Optimum Character Recognition System Using Decision Functions , 1958, IRE Trans. Electron. Comput..

[7]  Xin Zhou,et al.  LS Bound based gene selection for DNA microarray data , 2005, Bioinform..

[8]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jürgen Wolf,et al.  CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data , 2006, Bioinform..

[11]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Robert P. W. Duin,et al.  The interaction between classification and reject performance for distance-based reject-option classifiers , 2006, Pattern Recognit. Lett..

[13]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[14]  Shutao Li,et al.  Gene Feature Extraction Using T-Test Statistics and Kernel Partial Least Squares , 2006, ICONIP.

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[16]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[17]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[18]  Ulrich Rückert,et al.  Cost Curves for Abstaining Classifiers , 2006 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[21]  Blaise Hanczar,et al.  Classification with reject option in gene expression data , 2008, Bioinform..

[22]  Wei Pan,et al.  A comparative study of discriminating human heart failure etiology using gene expression profiles , 2005, BMC Bioinformatics.

[23]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[24]  Bernard Dubuisson,et al.  A statistical decision rule with incomplete knowledge about classes , 1993, Pattern Recognit..