PLS Dimension Reduction for Classification with Microarray Data

Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification procedure consisting of PLS dimension reduction and linear discriminant analysis on the new components is compared with some of the best state-of-the-art classification methods. Moreover, a boosting algorithm is applied to this classification method. In addition, a simple procedure to choose the number of PLS components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proved. In addition, we show how PLS can be used for data visualization using real data. The whole study is based on 9 real microarray cancer data sets.

[1]  Tormod Næs,et al.  Comparison of prediction methods for multicollinear data , 1985 .

[2]  Dean M. Young,et al.  Quadratic discrimination: Some results on optimal low-dimensional representation , 1987 .

[3]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[4]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[5]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[6]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[7]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[8]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[9]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[14]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[15]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[16]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[17]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[18]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[19]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[20]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[21]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[22]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[23]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Harald Martens,et al.  Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression , 2001 .

[25]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[26]  Guy Perrière,et al.  Between-group analysis of microarray data , 2002, Bioinform..

[27]  Debashis Ghosh,et al.  Singular Value Decomposition Regression Models for Classification of Tumors from Microarray Experiments , 2001, Pacific Symposium on Biocomputing.

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[30]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[32]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.

[33]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[34]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[35]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[36]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[37]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[38]  Edward R. Dougherty,et al.  Is cross-validation better than resubstitution for ranking genes? , 2004, Bioinform..

[39]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[40]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[41]  Gerhard Tutz,et al.  Aggregating classifiers with ordinal response structure , 2005 .