A novel approach for automatic gene selection and classification of gene based colon cancer datasets

Colon cancer heavily changes the composition of human genes (expressions). The deviation in the chemical composition of genes can be exploited to automatically diagnose colon cancer. The major challenge in the analysis of human gene based datasets is their large dimensionality. Therefore, efficient techniques are needed to select discerning genes. In this research article, we propose a novel classification technique that exploits the variations in gene expressions for classifying colon gene samples into normal and malignant classes, and quite intelligently tackles the larger dimensionality of gene based datasets. Previously individual feature selection techniques have been used for selection of discerning gene expressions, however, their performance is limited. In this research study, we propose a feed forward gene selection technique, wherein, two feature selection techniques are used one after the other. The genes selected by the first technique are fed as input to the second feature selection technique that selects genes from the given gene subset. The selected genes are then classified by using linear kernel of support vector machines (SVM). The feed forward approach of gene selection has shown improved performance. The proposed technique has been tested on three standard colon cancer datasets, and improved performance has been observed. It is observed that feed forward method of gene selection substantially reduces the size of gene based datasets, thereby reducing the computational time to a great extent. Performance of the proposed technique has also been compared with existing techniques of colon cancer diagnosis, and improved performance has been observed. Therefore, we hope that the proposed technique can be effectively used for diagnosis of colon cancer.

[1]  Vadlamani Ravi,et al.  Colon cancer prediction with genetics profiles using evolutionary techniques , 2011, Expert Syst. Appl..

[2]  Jianping Li,et al.  A Multiple Kernel Support Vector Machine Scheme for Simultaneous Feature Selection and Rule-Based Classification , 2007, PAKDD.

[3]  Zhihong Man,et al.  Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis , 2013, Neural Computing and Applications.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[6]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[7]  Ahmad Ali,et al.  A Recent Survey on Colon Cancer Detection Techniques , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Muchenxuan Tong,et al.  An ensemble of SVM classifiers based on gene pairs , 2013, Comput. Biol. Medicine.

[9]  Abdul Jalil,et al.  Classification of colon biopsy images based on novel structural features , 2013, 2013 IEEE 9th International Conference on Emerging Technologies (ICET).

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[12]  U. Mansmann,et al.  Differential gene expression in colon carcinoma cells and tissues detected with a cDNA array , 1999, International journal of cancer.

[13]  M F Dixon,et al.  Observer variation in the histological grading of rectal carcinoma. , 1983, Journal of clinical pathology.

[14]  Keun Ho Ryu,et al.  Gene Expression Data Classification using Discrete Wavelet Transform , 2009, BIOCOMP.

[15]  Abdul Jalil,et al.  A novel approach for ensemble clustering of colon biopsy images , 2013, 2013 11th International Conference on Frontiers of Information Technology.

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.