Multiclass Molecular Cancer Classification by Kernel Subspace Methods with Effective Kernel Parameter Selection

Microarray techniques provide new insights into molecular classification of cancer types, which is critical for cancer treatments and diagnosis. Recently, an increasing number of supervised machine learning methods have been applied to cancer classification problems using gene expression data. Support vector machines (SVMs), in particular, have become one of the most effective and leading methods. However, there exist few studies on the application of other kernel methods in the literature. We apply a kernel subspace (KS) method to multiclass cancer classification problems, and assess its validity by comparing it with multiclass SVMs. Our comparative study using seven multiclass cancer datasets demonstrates that the KS method has high performance that is comparable to multiclass SVMs. Furthermore, we propose an effective criterion for kernel parameter selection, which is shown to be useful for the computation of the KS method.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[3]  Václav Hlavác,et al.  Ten Lectures on Statistical and Structural Pattern Recognition , 2002, Computational Imaging and Vision.

[4]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[5]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[6]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[9]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[11]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[12]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jin Hyun Park,et al.  Gene selection and classification from microarray data using kernel machine , 2004, FEBS letters.

[14]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Richard Simon,et al.  Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n) , 2003, SKDD.

[21]  Koji Tsuda Subspace classifier in the Hilbert space , 1999, Pattern Recognit. Lett..

[22]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[25]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[26]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[27]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[28]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[29]  Jin Hyun Park,et al.  New gene selection method for classification of cancer subtypes considering within‐class variation , 2003, FEBS letters.

[30]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[31]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[32]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[33]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[35]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[36]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[37]  Sayan Mukherjee,et al.  An Analytical Method for Multiclass Molecular Cancer Classification , 2003, SIAM Rev..