Correlation Kernels for Support Vector Machines Classification with Applications in Cancer Data

High dimensional bioinformatics data sets provide an excellent and challenging research problem in machine learning area. In particular, DNA microarrays generated gene expression data are of high dimension with significant level of noise. Supervised kernel learning with an SVM classifier was successfully applied in biomedical diagnosis such as discriminating different kinds of tumor tissues. Correlation Kernel has been recently applied to classification problems with Support Vector Machines (SVMs). In this paper, we develop a novel and parsimonious positive semidefinite kernel. The proposed kernel is shown experimentally to have better performance when compared to the usual correlation kernel. In addition, we propose a new kernel based on the correlation matrix incorporating techniques dealing with indefinite kernel. The resulting kernel is shown to be positive semidefinite and it exhibits superior performance to the two kernels mentioned above. We then apply the proposed method to some cancer data in discriminating different tumor tissues, providing information for diagnosis of diseases. Numerical experiments indicate that our method outperforms the existing methods such as the decision tree method and KNN method.

[1]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Nello Cristianini,et al.  Support Vector Machines and Kernel Methods: The New Generation of Learning Machines , 2002, AI Mag..

[4]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[6]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Costas Pitris,et al.  Support vector machines with the correlation kernel for the classification of Raman spectra , 2011, BiOS.

[8]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[9]  Zheng Zhang,et al.  An Analysis of Transformation on Non - Positive Semidefinite Similarity Matrix for Kernel Machines , 2005, ICML 2005.

[10]  Colin Campbell,et al.  Analysis of SVM with Indefinite Kernels , 2009, NIPS.

[11]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[12]  Costas Pitris,et al.  Classification of Raman spectra using the correlation kernel , 2011 .

[13]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[17]  Jieping Ye,et al.  Training SVM with indefinite kernels , 2008, ICML '08.