Combining Dissimilarities in a Hyper Reproducing Kernel Hilbert Space for Complex Human Cancer Prediction

Support vector machines (SVM) have been applied to the classification of cancer samples using the gene expression profiles. However, they rely on Euclidean distances that fail to reflect accurately the proximities among sample profiles. Then, non Euclidean dissimilarities provide additional information that should be considered to reduce the misclassification errors. In this paper, we incorporate in the classical nu-SVM algorithm a linear combination of non-Euclidean dissimilarities. The weights of the combination are learnt in a HRKHS (hyper reproducing kernel Hilbert space) using an efficient semidefinite programming algorithm. This approach allow us to incorporate a smoothing term that penalizes the complexity of the family of distances and avoids overfitting. The experimental results suggest that the method proposed helps to reduce the misclassification errors in several human cancer problems.

[1]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[2]  Jill P. Mesirov,et al.  Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[3]  Javier De Las Rivas,et al.  Combining dissimilarity based classifiers for cancer prediction using gene expression profiles , 2007, BMC Bioinformatics.

[4]  Manuel Martín-Merino,et al.  Self Organizing Map and Sammon Mapping for Asymmetric Proximities , 2001, ICANN.

[5]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[6]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[7]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[10]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[11]  T. Golub,et al.  The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma. , 2003, Blood.

[12]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[13]  Edward Y. Chang,et al.  Formulating distance functions via the kernel trick , 2005, KDD '05.

[14]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[15]  Bernhard Schölkopf,et al.  A Kernel Approach for Learning from Almost Orthogonal Patterns , 2002, European Conference on Principles of Data Mining and Knowledge Discovery.

[16]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[17]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[21]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[22]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[23]  Koji Tsuda,et al.  Support vector classifier with asymetric kernel function , 1999, The European Symposium on Artificial Neural Networks.

[24]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[25]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[26]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[28]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[29]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[30]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..