Dimension reduction-based penalized logistic regression for cancer classification using microarray data

The use of penalized logistic regression for cancer classification using microarray expression data is presented. Two dimension reduction methods are respectively combined with the penalized logistic regression so that both the classification accuracy and computational speed are enhanced. Two other machine-learning methods, support vector machines and least-squares regression, have been chosen for comparison. It is shown that our methods have achieved at least equal or better results. They also have the advantage that the output probability can be explicitly given and the regression coefficients are easier to interpret. Several other aspects, such as the selection of penalty parameters and components, pertinent to the application of our methods for cancer classification are also discussed.

[1]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[4]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[5]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[6]  S. J. Press,et al.  Choosing between Logistic Regression and Discriminant Analysis , 1978 .

[7]  Michael G. Schimek,et al.  Penalized Logistic Regression in Gene Expression Analysis , 2003 .

[8]  Paul H. C. Eilers,et al.  Classification of microarray data with penalized logistic regression , 2001, SPIE BiOS.

[9]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[10]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[11]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[12]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[13]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[14]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[15]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Carsten Peterson,et al.  Analyzing tumor gene expression profiles , 2003, Artif. Intell. Medicine.

[19]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[20]  Gene H. Golub,et al.  Matrix computations , 1983 .

[21]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[22]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .