论文信息 - Stable classification with applications to microarray data

Stable classification with applications to microarray data

Abstract A stable classification method called minimum-error-distance threshold (MEDT) with variable selection is developed for the two-class prediction (classification) problem. First, a set of “significant” variables (genes) associated with the two classes is selected using the Wilcoxon rank-sum test, and then a data-driven cutoff point for a distance-based classification algorithm is determined by minimizing a combination of the rates of false positives and false negatives estimated by leave-one-out cross validation. This cutoff point is used to classify a given test set based on the selected variables. The proposed methodology is applied to the leukemia data set analyzed in Golub et al. (Science 286 (1999) 531). To compare the proposed methodology with the existing discrimination methods, the diagonal-linear-discriminant analysis and nearest-neighbor classifiers, 1000 cross validations are performed. The data set is randomly split into a training set consisting of 32 patients with acute lymphoblastic leukemia (ALL) and 16 with acute myeloid leukemia (AML) and a test set consisting of 15 patients with ALL and nine with AML. Performance summaries are calculated. A simulation study is conducted to demonstrate the superior stability of MEDT compared with that of the aforementioned existing methods. The stability measure used is the mean-to-standard deviation ratio of the number of correct predictions.

Cheng Cheng | Chin-Shang Li

[1] R. Tibshirani,et al. Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2] M. M. Barnard. THE SECULAR VARIATIONS OF SKULL CHARACTERS IN FOUR SERIES OF EGYPTIAN SKULLS , 1935 .

[3] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5] Ash A. Alizadeh,et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[6] Trevor Hastie,et al. Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[7] S. Dudoit,et al. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[8] P. Brown,et al. Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[9] John D. Storey,et al. Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[10] Pierre R. Bushel,et al. Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[11] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .