Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

Gene expression data always suffer from the high dimensionality issue, therefore feature selection becomes a fundamental tool in the analysis of cancer classification. Basically, the data can be collected easily without providing the label information, which is quite useful in improving the accuracy of the classification. Label information usually difficult to obtain as the labelling processes are tedious, costly and error prone. Previous studies of gene selection are mostly dedicated to supervised and unsupervised approaches. Support vector machine SVM is a common supervised technique to address gene selection and cancer classification problems. Hence, this paper aims to propose a semi-supervised SVM-based feature selection S$$^3$$VM-FS, which simultaneously exploit the knowledge from unlabelled and labelled data. Experimental results on the gene expression data of lung cancer show that S$$^3$$VM-FS achieves the higher accuracy yet requires shorter processing time compares with the well-known supervised method, SVM-based recursive feature elimination SVM-RFE and the improved method, S$$^3$$VM-RFE.

[1]  M. K. Luhandjula Studies in Fuzziness and Soft Computing , 2013 .

[2]  Zhili Wu,et al.  Feature Selection with Transductive Support Vector Machines , 2006, Feature Extraction.

[3]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[4]  Mohamed A. Ismail,et al.  A novel ensemble selection method for cancer diagnosis using microarray datasets , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[5]  Chen Zhang,et al.  A novel multi-stage feature selection method for microarray expression data analysis , 2013, Int. J. Data Min. Bioinform..

[6]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[7]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[8]  Chris H. Q. Ding,et al.  Consensus group stable feature selection , 2009, KDD.

[9]  Zhili Wu,et al.  Kernel based learning methods for pattern and feature analysis , 2004 .

[10]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[11]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[12]  Thibault Helleputte,et al.  Partially supervised feature selection with regularized linear models , 2009, ICML '09.

[13]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[14]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[15]  Denis Hamad,et al.  Constraint scores for semi-supervised feature selection: A comparative study , 2011, Pattern Recognit. Lett..

[16]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[17]  Yi Zhang,et al.  Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. , 2006, The Journal of molecular diagnostics : JMD.

[18]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[19]  Haytham Elghazel,et al.  Semi-supervised Feature Importance Evaluation with Ensemble Learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Khalid Benabdeslem,et al.  Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Kostas Karpouzis,et al.  Emerging Artificial Intelligence Applications in Computer Engineering - Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies , 2007, Emerging Artificial Intelligence Applications in Computer Engineering.