Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization

An innovative gene selection approach using shuffle method prior to cancer classification is proposed.A novel optimization algorithm, COA-GA, is developed by integrating cuckoo optimization algorithm (COA) and GA to enhance classification performance.Performance of the COA-GA is analyzed and compared with GA, PSO and COA.It is further confirmed that traditional clustering does not have any impact on gene selection and classification performance.Optimization based clustering is shown to enhance the accuracy of gene selection and classification. This research presents an innovative method for cancer identification and type classification using microarray data. The method is based on gene selection with shuffling in association with optimization based unconventional data clustering. A new hybrid optimization algorithm, COA-GA, is developed by synergizing recently invented Cuckoo Optimization Algorithm (COA) with a more traditional genetic algorithm (GA) for data clustering to select the most dominant genes using shuffling. For gene classification, Support Vector Machine (SVM) and Multilayer Perceptron (MLP) artificial neural networks are used. Literature suggests that data clustering using traditional approaches such as K-means, C-means and Hierarchical do not have any impact on classification accuracy. This is also confirmed in this investigation. However, results show that optimization based clustering with shuffling increase the classification accuracy significantly. The proposed algorithm (COA-GA) not only outperforms COA, GA and Particle Swarm optimization (PSO) in achieving?better classification performance but also reaches a better global minimum with only few iterations. Higher accuracy is observed to have achieved with SVM classifier compared to MLP in all datasets used.

[1]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[4]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[5]  Wei Kong,et al.  A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. , 2007, Talanta.

[6]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[7]  Humar Kahramanli,et al.  A Modified Cuckoo Optimization Algorithm for Engineering Optimization , 2012 .

[8]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[9]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[12]  Robert Clarke,et al.  Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data , 2006, Bioinform..

[13]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[14]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[15]  M. H. Shaheed,et al.  Cancer classification using clustering based gene selection and artificial neural networks , 2011, The 2nd International Conference on Control, Instrumentation and Automation.

[16]  Christine M. Anderson-Cook Practical Genetic Algorithms (2nd ed.) , 2005 .

[17]  Hong Yan,et al.  Fuzzy clustering analysis of microarray data , 2008, Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine.

[18]  Iztok Fister,et al.  Cuckoo Search: A Brief Literature Review , 2014, ArXiv.

[19]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[20]  Durga Toshniwal,et al.  SPPS: Supervised Projected Clustering Method Based on Particle Swarm Optimization , 2012 .

[21]  Pinar Civicioglu,et al.  A conceptual comparison of the Cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algorithms , 2013, Artificial Intelligence Review.

[22]  Wei Xiong,et al.  A DSRPCL-SVM Approach to Informative Gene Analysis , 2008, Genom. Proteom. Bioinform..

[23]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[24]  Musa H. Asyali,et al.  Gene Expression Profile Classification: A Review , 2006 .

[25]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[26]  Shu-Heng Chen,et al.  Genetic Algorithms and Genetic Programming in Computational Finance , 2002 .

[27]  Oscar Castillo,et al.  Shipwrecked on Fear: Selection of Electives in School Minorities in a University Using Cuckoo Search Algorithm , 2014, Recent Advances on Hybrid Approaches for Designing Intelligent Systems.

[28]  L. Finch A hybrid approach , 1998 .

[29]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[30]  Sassan Azadi,et al.  Optimizing Azadi Controller with COA , 2013 .

[31]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[33]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[34]  Jing J. Liang,et al.  Comprehensive learning particle swarm optimizer for global optimization of multimodal functions , 2006, IEEE Transactions on Evolutionary Computation.

[35]  Michael R. Thon,et al.  Identifying clusters of functionally related genes in genomes , 2007, Bioinform..

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[38]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[39]  Chao Li,et al.  Using the K-Nearest Neighbor Algorithm for the Classification of Lymph Node Metastasis in Gastric Cancer , 2012, Comput. Math. Methods Medicine.

[40]  Ramin Rajabioun,et al.  Cuckoo Optimization Algorithm , 2011, Appl. Soft Comput..