Ensemble of Support Vector Machines to Improve the Cancer Class Prediction Based on the Gene Expression Profiles

DNA microarrays provide rich profiles that are used in cancer prediction considering the gene expression levels across a collection of samples.Support Vector Machines (SVM), have been applied to the classification of cancer samples with encouraging results. However, they are usually based on Euclidean distances that fail to reflect accurately the sample proximities. Besides, SVM classifiers based on non-Euclidean dissimilarities fail to reduce significantly the errors. In this paper, we propose an ensemble of SVM classifiers in order to reduce the errors. The diversity among classifiers is induced considering a set of complementary dissimilarities and kernels. The experimental results suggest that that our algorithm improves classifiers based on a single dissimilarity and a combination strategy such as Bagging.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[5]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[6]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[7]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  Rebecca W. Doerge Bioinformatics and Computational Biology Solutions Using R and Bioconductor Edited by Gentleman, R., Carey, V., Huber, W., Irizarry, R., and Dudoit, S , 2006 .

[13]  Manuel Martín-Merino,et al.  Self Organizing Map and Sammon Mapping for Asymmetric Proximities , 2001, ICANN.

[14]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[15]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[17]  Alberto Muñoz,et al.  Visualizing asymmetric proximities with SOM and MDS models , 2005, Neurocomputing.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  M. Martin-Merino,et al.  A new Sammon algorithm for sparse data visualization , 2004, ICPR 2004.

[20]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[21]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[22]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[23]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.