A Hybrid of SVM and SCAD with Group-Specific Tuning Parameter for Pathway-Based Microarray Analysis

The incorporation of pathway data into the microarray analysis had lead to a new era in advance understanding of biological processes. However, this advancement is limited by the two issues in quality of pathway data. First, the pathway data are usually made from the biological context free, when it comes to a specific cellular process (e.g. lung cancer development), it can be that only several genes within pathways are responsible for the corresponding cellular process. Second, pathway data commonly curated from the literatures, it can be that some pathway may be included with the uninformative genes while the informative genes may be excluded. In this paper, we proposed a hybrid of support vector machine and smoothly clipped absolute deviation with group-specific tuning parameters (gSVM-SCAD) to select informative genes within pathways before the pathway evaluation process. Our experiments on lung cancer and gender data sets show that gSVM-SCAD obtains significant results in classification accuracy and in selecting the informative genes and pathways.

[1]  S. Deris,et al.  Pathway-Based Microarray Analysis for Defining Statistical Significant Phenotype-Related Pathways: A Review of Common Approaches , 2009, 2009 International Conference on Information Management and Engineering.

[2]  Wei Pan,et al.  Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms , 2007, Bioinform..

[3]  Ming Wu,et al.  Gene module level analysis: identification to networks and dynamics. , 2008, Current opinion in biotechnology.

[5]  Xi Chen,et al.  Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes , 2008, Bioinform..

[6]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[7]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[9]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[12]  Sylvia Richardson,et al.  Statistical Applications in Genetics and Molecular Biology Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods , 2011 .

[13]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[14]  Axel Benner,et al.  penalizedSVM: a R-package for feature selection SVM classification , 2009, Bioinform..

[15]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .