A muti-SVMs design for cancer diagnosis using DNA microarray data

Microarray data of gene expression pattern provide useful information for the diagnosis of certain diseases. However the dimension of microarray data is always very high and the volume of samples is small. How to select informative genes remains a challenge. In this research, multiple support vector machine (MSVM) were designed for disease diagnosis. Each SVM was trained using a few gene features. The importance of genes was evaluated by the structure error loss. SVMs with most important genes were linearly combined to form the disease classifier. The algorithm was applied to an artificial dataset. The human acute leukemia dataset was used as a test case.

[1]  David Correa Martins,et al.  A feature selection approach for identification of signature genes from SAGE data , 2007, BMC Bioinformatics.

[2]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[3]  G. Church,et al.  A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression , 2000, Nature Genetics.

[4]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[5]  L. Milanesi,et al.  Grid Methodology for Identifying Co-Regulated Genes and Transcription Factor Binding Sites , 2007, IEEE Transactions on NanoBioscience.

[6]  Kezhi Mao,et al.  Feature subset selection for support vector machines through discriminative function pruning analysis , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Beate Hartmann,et al.  Identification of candidate downstream genes for the homeodomain transcription factor Labial in Drosophila through oligonucleotide-array transcript imaging , 2001, Genome Biology.

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  Sayan Mukherjee,et al.  Learning Coordinate Covariances via Gradients , 2006, J. Mach. Learn. Res..

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Patrick O. Brown,et al.  Observing the living genome. , 1999 .

[13]  Kevin Dobbin,et al.  Comparison of microarray designs for class comparison and class discovery , 2002, Bioinform..

[14]  M J O'Hare,et al.  Linking gene expression patterns to therapeutic groups in breast cancer. , 2000, Cancer research.

[15]  Graziano Pesole,et al.  Selection of relevant genes in cancer diagnosis based on their prediction accuracy , 2007, Artif. Intell. Medicine.

[16]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[19]  Joaquín Dopazo,et al.  Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. , 2002, Journal of biotechnology.

[20]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[21]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[22]  J. Glasner,et al.  Genome-wide expression profiling in Escherichia coli K-12. , 1999, Nucleic acids research.

[23]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[24]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[29]  C. Rosenow,et al.  Monitoring gene expression using DNA microarrays. , 2000, Current opinion in microbiology.

[30]  Eugene Kolker,et al.  Statistical analysis of global gene expression data: some practical considerations. , 2004, Current opinion in biotechnology.

[31]  Giorgio Valentini,et al.  Cancer recognition with bagged ensembles of support vector machines , 2004, Neurocomputing.

[32]  K. Martin,et al.  Identifying expressed genes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[36]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[37]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .