Effective dimension reduction methods for tumor classification using gene expression data

MOTIVATION One particular application of microarray data, is to uncover the molecular variation among cancers. One feature of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in the thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. An efficient way to solve this problem is by using dimension reduction statistical techniques in conjunction with nonparametric discriminant procedures. RESULTS We view the classification problem as a regression problem with few observations and many predictor variables. We use an adaptive dimension reduction method for generalized semi-parametric regression models that allows us to solve the 'curse of dimensionality problem' arising in the context of expression data. The predictive performance of the resulting classification rule is illustrated on two well know data sets in the microarray literature: the leukemia data that is known to contain classes that are easy 'separable' and the colon data set.

[1]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[2]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[3]  Debashis Ghosh,et al.  Singular Value Decomposition Regression Models for Classification of Tumors from Microarray Experiments , 2001, Pacific Symposium on Biocomputing.

[4]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[5]  King C. P. Li High dimensional data analysis via the sir/phd approach , 2000 .

[6]  R. Cook On the Interpretation of Regression Plots , 1994 .

[7]  C. Loader CHANGE POINT ESTIMATION USING NONPARAMETRIC REGRESSION , 1996 .

[8]  H. Tong,et al.  An adaptive estimation of dimension reduction , 2002 .

[9]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[13]  Jianqing Fan,et al.  Generalized Partially Linear Single-Index Models , 1997 .

[14]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[15]  Peter J. Park,et al.  A Nonparametric Scoring Algorithm for Identifying Informative Genes from Microarray Data , 2000, Pacific Symposium on Biocomputing.

[16]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[17]  Howell Tong,et al.  Consistent nonparametric order determination and chaos, with discussion , 1992 .

[18]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[19]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.