Classification of Colorectal Cancer Using Clustering and Feature Selection Approaches

Accurate cancer classification and responses to treatment are important in clinical cancer research since cancer acts as a family of gene-based diseases. Microarray technology has widely developed to measure gene expression level changes under normal and experimental conditions. Normally, gene expression data are high dimensional and characterized by small sample sizes. Thus, feature selection is needed to find the smallest number of informative genes and improve the classification accuracy and the biological interpretability results. Due to some feature selection methods neglect the interactions among genes, thus, clustering is used to group the similar genes together. Besides, the quality of the selected data can determine the effectiveness of the classifiers. This research proposed clustering and feature selection approaches to classify the gene expression data of colorectal cancer. Subsequently, a feature selection approach based on centroid clustering provide higher classification accuracy compared with other approaches.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. Rizzino,et al.  DNA microarray analyses of genes regulated during the differentiation of embryonic stem cells , 2000, Molecular reproduction and development.

[3]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[4]  José Antonio Castellanos Garzón,et al.  A Gene Selection Approach based on Clustering for Classification Tasks in Colon Cancer , 2016 .

[5]  Safaai Deris,et al.  Application of String Kernels in Protein Sequence Classification , 2005, Applied bioinformatics.

[6]  S. S. Ravi,et al.  Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results , 2005, PKDD.

[7]  Mohd Saberi Mohamad,et al.  Selecting informative genes from microarray data by using hybrid methods for cancer classification , 2008, Artificial Life and Robotics.

[8]  Patrik Rydén,et al.  Classification of microarrays; synergistic effects between normalization, gene selection and machine learning , 2011, BMC Bioinformatics.

[9]  Javier Bajo,et al.  Retreatment Predictions in Odontology by means of CBR Systems , 2016, Comput. Intell. Neurosci..

[10]  Tetsuya Ikemoto,et al.  Gene profile in the spleen under massive partial hepatectomy using complementary DNA microarray and pathway analysis , 2014, Journal of gastroenterology and hepatology.

[11]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[12]  M. Narasimha Murty,et al.  Classification by majority voting in feature partitions , 2016, Int. J. Inf. Decis. Sci..

[13]  Sara Tarek,et al.  Cancer classification ensemble system based on gene expression profiles , 2016, 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA).

[14]  Sina Khanmohammadi,et al.  An improved overlapping k-means clustering method for medical applications , 2017, Expert Syst. Appl..

[15]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[16]  João Maroco,et al.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests , 2011, BMC Research Notes.

[17]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[18]  Mohamed F. Ghalwash,et al.  Structured feature selection using coordinate descent optimization , 2016, BMC Bioinformatics.

[19]  Angélica González,et al.  A new clustering algorithm applying a hierarchical method neural network , 2011, Log. J. IGPL.

[20]  Andrea Vattani,et al.  k-means Requires Exponentially Many Iterations Even in the Plane , 2008, SCG '09.

[21]  Tung-Shou Chen,et al.  Proceedings of 2005 International Symposium on Intelligent Signal Processing and Communication Systems a Combined K-means and Hierarchical Clustering Method for Improving the Clustering Efficiency of Microarray , 2022 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[24]  Javier Bajo,et al.  Biomedic Organizations: An intelligent dynamic architecture for KDD , 2013, Inf. Sci..

[25]  Kohbalan Moorthy,et al.  Random forest for gene selection and microarray data classification. , 2011 .

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Juan M. Corchado,et al.  An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways , 2016, Int. J. Bioinform. Res. Appl..

[28]  Esfandiar Eslami,et al.  GHFHC: Generalized Hesitant Fuzzy Hierarchical Clustering Algorithm , 2016, Int. J. Intell. Syst..

[29]  Dervis Karaboga,et al.  A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number , 2017, Swarm Evol. Comput..

[30]  Ram Kothandan,et al.  Identifying microRNAs involved in cancer pathway using support vector machines , 2015, Comput. Biol. Chem..

[31]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).