GECC: Gene Expression Based Ensemble Classification of Colon Samples

Gene expression deviates from its normal composition in case a patient has cancer. This variation can be used as an effective tool to find cancer. In this study, we propose a novel gene expressions based colon classification scheme (GECC) that exploits the variations in gene expressions for classifying colon gene samples into normal and malignant classes. Novelty of GECC is in two complementary ways. First, to cater overwhelmingly larger size of gene based data sets, various feature extraction strategies, like, chi-square, F-Score, principal component analysis (PCA) and minimum redundancy and maximum relevancy (mRMR) have been employed, which select discriminative genes amongst a set of genes. Second, a majority voting based ensemble of support vector machine (SVM) has been proposed to classify the given gene based samples. Previously, individual SVM models have been used for colon classification, however, their performance is limited. In this research study, we propose an SVM-ensemble based new approach for gene based classification of colon, wherein the individual SVM models are constructed through the learning of different SVM kernels, like, linear, polynomial, radial basis function (RBF), and sigmoid. The predicted results of individual models are combined through majority voting. In this way, the combined decision space becomes more discriminative. The proposed technique has been tested on four colon, and several other binary-class gene expression data sets, and improved performance has been achieved compared to previously reported gene based colon cancer detection techniques. The computational time required for the training and testing of 208 × 5,851 data set has been 591.01 and 0.019 s, respectively.

[1]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[2]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[3]  Brian C Wilson,et al.  Diagnostic potential of near-infrared Raman spectroscopy in the colon: differentiating adenomatous from hyperplastic polyps. , 2003, Gastrointestinal endoscopy.

[4]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[5]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Krzysztof Siwek,et al.  Neural system for heartbeats recognition using genetically integrated ensemble of classifiers , 2011, Comput. Biol. Medicine.

[7]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[8]  Kashif Rajpoot,et al.  Co-occurrence and morphological analysis for colon tissue biopsy classification , 2006 .

[9]  Ahmad Ali,et al.  A Recent Survey on Colon Cancer Detection Techniques , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Anselmo Cardoso de Paiva,et al.  Detection of masses in mammogram images using CNN, geostatistic functions and SVM , 2011, Comput. Biol. Medicine.

[11]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[12]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[13]  Kemal Polat,et al.  A new feature selection method on classification of medical datasets: Kernel F-score feature selection , 2009, Expert Syst. Appl..

[14]  Abdulhamit Subasi,et al.  Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders , 2013, Comput. Biol. Medicine.

[15]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Haishan Zeng,et al.  Laser-induced autofluorescence microscopy of normal and tumor human colonic tissue. , 2004, International journal of oncology.

[17]  Bayan S. Sharif,et al.  Fractal analysis in the detection of colonic cancer images , 2002, IEEE Transactions on Information Technology in Biomedicine.

[18]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Abdul Jalil,et al.  Ensemble classification of colon biopsy images based on information rich hybrid features , 2014, Comput. Biol. Medicine.

[20]  Jagath C. Rajapakse,et al.  Multiclass Gene Selection Using Pareto-Fronts , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[22]  U. Mansmann,et al.  Differential gene expression in colon carcinoma cells and tissues detected with a cDNA array , 1999, International journal of cancer.

[23]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[24]  Jin Young Kim,et al.  Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering , 2012, Comput. Methods Programs Biomed..

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[26]  Hong Yan,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[29]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[30]  Giorgio Valentini,et al.  A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Zhihong Man,et al.  Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis , 2013, Neural Computing and Applications.

[32]  Kashif Rajpoot,et al.  SVM Optimization for Hyperspectral Colon Tissue Cell Classification , 2004, MICCAI.

[33]  Raouf N. G. Naguib,et al.  Orientational coherence metrics: classification of colonic cancer images based on human form perception , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[34]  Muchenxuan Tong,et al.  An ensemble of SVM classifiers based on gene pairs , 2013, Comput. Biol. Medicine.

[35]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[36]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[37]  Vadlamani Ravi,et al.  Colon cancer prediction with genetics profiles using evolutionary techniques , 2011, Expert Syst. Appl..

[38]  Abdul Jalil,et al.  A novel approach for ensemble clustering of colon biopsy images , 2013, 2013 11th International Conference on Frontiers of Information Technology.

[39]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[40]  Nasir M. Rajpoot,et al.  Texture based classification of hyperspectral colon biopsy samples using CLBP , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[41]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[42]  Jianping Li,et al.  A Multiple Kernel Support Vector Machine Scheme for Simultaneous Feature Selection and Rule-Based Classification , 2007, PAKDD.

[43]  David G. Stork,et al.  Pattern Classification , 1973 .

[44]  Joon Ho Wang,et al.  Gene profiling of colonic serrated adenomas by using oligonucleotide microarray , 2008, International Journal of Colorectal Disease.

[45]  Keun Ho Ryu,et al.  Gene Expression Data Classification using Discrete Wavelet Transform , 2009, BIOCOMP.

[46]  B Terracini,et al.  Malignant mesothelioma of the pleura: interobserver variability. , 1995, Journal of clinical pathology.

[47]  R. Simon,et al.  Gene expression profiling reveals a massive, aneuploidy-dependent transcriptional deregulation and distinct differences between lymph node-negative and lymph node-positive colon carcinomas. , 2007, Cancer research.

[48]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.