Fast support vector classifier applied to microarray data

Since the early stages of the introduction of DNA microarray technology, there has been an enormous interest on clinical application for various diseases diagnosis. Microarray data classification is a difficult task for biologists due to its small sample sizes combined to its high number of features increasing the risk of overfitting. In the past years tools have been developed to extract biological information from microarray data but there is no common accepted method. In this paper we established a processing method based on a Fast Support Vectors Classifier and a feature selection scheme based on the R package LIMMA. The proposed method was tested on a lung cancer gene expression dataset provided as part of a competition called IMPROVER Diagnostic Signature Challenge. The scoring methods used to evaluate the algorithm performance were BCM, AUPR, CCEM as defined by IMPROVER organizers and results were encouraging.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  A. Groß The Global Economic Cost of Cancer: Improving Outcomes and Cost by Reducing International Barriers to Care , 2015 .

[3]  P. Finn,et al.  Hubs in biological interaction networks exhibit low changes in expression in experimental asthma , 2007, Molecular systems biology.

[4]  Limsoon Wong An Introduction to some New Results in Bioinformatics and Computational Biology , 2013, J. Bioinform. Comput. Biol..

[5]  Guide to Probe Logarithmic Intensity Error ( PLIER ) Estimation , 2005 .

[6]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[7]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[8]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[9]  Mario Lauria,et al.  Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge , 2013, Bioinform..

[10]  B. Rollins,et al.  Expression of the focal adhesion protein paxillin in lung cancer and its relation to cell motility , 1999, Oncogene.

[11]  Radu Dogaru,et al.  An efficient finite precision RBF-M neural network architecture using support vectors , 2010, 10th Symposium on Neural Network Applications in Electrical Engineering.

[12]  Holger Sültmann,et al.  Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. , 2009, Lung cancer.

[13]  Radu Dogaru,et al.  A modified RBF neural network for efficient current-mode VLSI implementation , 1996, Proceedings of Fifth International Conference on Microelectronics for Neural Networks.

[14]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[15]  E. Southern,et al.  Oligonucleotide hybridizations on glass supports: a novel linker for oligonucleotide synthesis and hybridization properties of oligonucleotides synthesised in situ. , 1992, Nucleic acids research.

[16]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[17]  Rafael Rosell,et al.  Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer , 2011, International journal of cancer.

[18]  Rafael A. Irizarry,et al.  Stochastic models inspired by hybridization theory for short oligonucleotide arrays , 2004, J. Comput. Biol..

[19]  Hinrich W. H. Göhlmann,et al.  Gene Expression Studies Using Affymetrix Microarrays , 2009, Chapman and Hall / CRC mathematical and computational biology series.

[20]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[21]  Richard Simon,et al.  Iterative class discovery and feature selection using Minimal Spanning Trees , 2004, BMC Bioinformatics.

[22]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[23]  M. Tyers,et al.  Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. , 2002, Cancer research.