Performance analysis of classifiers for colon cancer detection from dimensionality reduced microarray gene data

Cancer disease is accountable for many deaths that are over 9.6 million in 2018 and roughly one out of six deaths occur because of cancer worldwide. The colon cancer is the second prominent source of death of around 1.8 million cases. This research is inclined to detect the colon cancer from microarray dataset. It will aids the experts to distinguish the cancer cells from normal cells for appropriate determination and treatment of cancer at earlier stages that leads to increase the survival rate of the patients. The high dimensionality in microarray dataset with less samples and more attributes creates lag in the detection capability of the classifier. Hence there is a need for dimensionality reduction techniques to preserve the significant genes that are prominent in the disease classification. In this article, at first ANOVA method used to select the best genes and then principal component analysis (PCA) and fuzzy C‐means clustering (FCM) techniques are further employed to choose relevant genes. The PCA and FCM features are classified using model, discriminant, regression, hybrid, and heuristic‐based classifiers. The attained results show that the heuristic classifier with PCA features is encapsulated an average classification accuracy of 97.92% for classifying both the colon cancer and normal samples. Also, for FCM features, the Heuristic classifier is maintained at an average classification accuracy of 99.48% and 97.92% for classifying the colon cancer and normal samples, respectively. The Heuristic classifier outperforms with high accuracy than all other classifiers in the classification of colon cancer.

[1]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[2]  Mohd Saberi Mohamad,et al.  Improved Support Vector Machine Using Multiple SVM-RFE for Cancer Classification , 2017 .

[3]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[4]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[5]  Nikola Kasabov,et al.  Fuzzy clustering of gene expression data , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[6]  Vaishali P Khobragade,et al.  A Classification of Microarray Gene Expression Data Using Hybrid Soft Computing Approach , 2012 .

[7]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[8]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[9]  C. Devi Arockia Vanitha,et al.  Gene Expression Data Classification Using Support Vector Machine and Mutual Information-based Gene Selection☆ , 2015 .

[10]  Charles Wang,et al.  Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models , 2004, Comput. Biol. Chem..

[11]  Jung Eun Lee,et al.  Sex- and gender-specific disparities in colorectal cancer risk. , 2015, World journal of gastroenterology.

[12]  Md. Kamrul Hasan,et al.  Linear regression-based feature selection for microarray data classification , 2015, Int. J. Data Min. Bioinform..

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  Sophie Lambert-Lacroix,et al.  Local likelihood regression in generalized linear single-index models with applications to microarray data , 2006, Comput. Stat. Data Anal..

[15]  Adiwijaya,et al.  Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification , 2018, Journal of Computer Science.

[16]  AbdiHervé,et al.  Principal Component Analysis , 2010, Essentials of Pattern Recognition.

[17]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[18]  M. M. Barnard THE SECULAR VARIATIONS OF SKULL CHARACTERS IN FOUR SERIES OF EGYPTIAN SKULLS , 1935 .

[19]  Antai Wang,et al.  Gene selection for microarray data analysis using principal component analysis , 2005, Statistics in medicine.

[20]  Adiwijaya,et al.  A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest , 2018, J. Inf. Process. Syst..

[21]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[22]  Saeid Nahavandi,et al.  A novel aggregate gene selection method for microarray data classification , 2015, Pattern Recognit. Lett..

[23]  A. Goriely,et al.  Component retention in principal component analysis with application to cDNA microarray data , 2007, Biology Direct.

[24]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[25]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[26]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[27]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[28]  Satya Chaitanya Sripada,et al.  COMPARISON OF PURITY AND ENTROPY OF K-MEANS CLUSTERING AND FUZZY C MEANS CLUSTERING , 2011 .

[29]  Kanwal Garg,et al.  IMPROVED EXPECTATION MAXIMIZATION CLUSTERING ALGORITHM , 2014 .

[30]  J. Thornley Mathematical models in agriculture : a quantitative approach to problems in agriculture and related sciences , 1985 .

[31]  Sujata Chakravarty,et al.  A Comparative Study on Hierarchical, K-Means and Fuzzy C-Means Clustering Algorithms and Application to Microarray Gene Expression Data , 2015 .

[32]  Umar Mohammed,et al.  Probabilistic Models for Inference about Identity , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Adrian O’Hagan,et al.  Improved model-based clustering performance using Bayesian initialization averaging , 2015, Computational Statistics.

[34]  Mohammad Hossein Moattar,et al.  Robust and stable feature selection by integrating ranking methods and wrapper technique in genetic data classification. , 2014, Biochemical and biophysical research communications.

[35]  Yonghong Peng,et al.  Microarray Gene Expression Data Mining : Clustering Analysis Review , 2006 .

[36]  Enrique Alba,et al.  Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[37]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[38]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[40]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[42]  Ali Najafi,et al.  Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest , 2017, Iranian journal of pathology.

[43]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[44]  ChinKhew-Voon Logistic regression for disease classification using microarray data , 2007 .

[45]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[46]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[47]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[48]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[50]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[51]  A. Batistatou,et al.  Molecular basis of colorectal cancer. , 2010, The New England journal of medicine.

[52]  Philip E. Bourne,et al.  Structural Bioinformatics: Bourne/Structural Bioinformatics , 2005 .

[53]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[54]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.