A hybrid of clustering and quantum genetic algorithm for relevant genes selection for cancer microarray data

In order to efficiently explore and exploit large search space, quantum variant of genetic algorithm has been suggested in literature. It utilizes quantum computing principle and genetic operators. Despite the use of the quantum variant of GA, memory and computation time requirements for high dimensional data like microarrays are huge. In this paper, we propose a hybrid approach, ClusterQGA, that uses clustering to select a small set of non-redundant representative genes and then applies Quantum Genetic Algorithm to determine a minimal set of relevant and non-redundant genes. Also a new fitness function is proposed to reduce number of genes without sacrificing the classification accuracy. The effectiveness of the proposed approach in comparison to existing methods in terms of classification accuracy and number of features has been experimentally established for both binary and multi-class publicly available cancer microarray datasets. The proposed approach reduces the computation time of Quantum Genetic Algorithm for high dimension microarray data.

[1]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[2]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[3]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[4]  Carlos J. Alonso,et al.  Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods , 2012, Expert Syst. Appl..

[5]  Gexiang Zhang,et al.  Quantum Computing Based Machine Learning Method and Its Application in Radar Emitter Signal Recognition , 2004, MDAI.

[6]  Jin Weidong,et al.  A Novel Feature Extraction Approach for Radar Emitter Signals , 2007, 2007 2nd IEEE Conference on Industrial Electronics and Applications.

[7]  N. Hashimoto,et al.  Gene Expression-Based Molecular Diagnostic System for Malignant Gliomas Is Superior to Histological Diagnosis , 2007, Clinical Cancer Research.

[8]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[9]  Manju Sardana,et al.  A Comparative Study of Clustering Methods for Relevant Gene Selection in Microarray Data , 2012 .

[10]  M. Batouche,et al.  A new quantum-inspired genetic algorithm for solving the travelling salesman problem , 2004, 2004 IEEE International Conference on Industrial Technology, 2004. IEEE ICIT '04..

[11]  Ling Yuan,et al.  A Quantum-inspired Genetic Algorithm for Data Clustering , 2009 .

[12]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Jong-Hwan Kim,et al.  Genetic quantum algorithm and its application to combinatorial optimization problem , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[19]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[20]  Sung-Bae Cho,et al.  The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming , 2006, Artif. Intell. Medicine.

[21]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[22]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[23]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[25]  Amir Jazaeri,et al.  Microarray analysis reveals distinct gene expression profiles among different histologic types of endometrial cancer. , 2003, Cancer research.

[26]  Sung-Bae Cho,et al.  Efficient huge-scale feature selection with speciated genetic algorithm , 2005 .

[27]  R. K. Agrawal,et al.  An incremental feature selection approach based on scatter matrices for classification of cancer microarray data , 2015, Int. J. Comput. Math..

[28]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[29]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[30]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[31]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[33]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[36]  Debashis Ghosh,et al.  Feature selection and molecular classification of cancer using genetic programming. , 2007, Neoplasia.

[37]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[38]  Xiaobo Li,et al.  Comparison of feature selection methods for multiclass cancer classification based on microarray data , 2011, 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI).

[39]  Jaewoo Kang,et al.  Improving Cancer Classification Accuracy Using Gene Pairs , 2010, PloS one.

[40]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[43]  Gil Alterovitz,et al.  Accelerating wrapper-based feature selection with K-nearest-neighbor , 2015, Knowl. Based Syst..

[44]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[45]  Hans Georg Schaathun Machine Learning in Image Steganalysis , 2012 .

[46]  H L Yu,et al.  Multiclass microarray data classification based on confidence evaluation. , 2012, Genetics and molecular research : GMR.

[47]  El-Ghazali Talbi,et al.  A comparison of PSO and GA approaches for gene selection and classification of microarray data , 2007, GECCO '07.

[48]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[50]  Khaled Mellouli,et al.  Hybridization of Genetic and Quantum Algorithm for gene selection and classification of Microarray data , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[51]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[52]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[53]  R. K. Agrawal,et al.  Clustering in Conjunction with Quantum Genetic Algorithm for Relevant Genes Selection for Cancer Microarray Data , 2013, PAKDD Workshops.

[54]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[55]  Kuldip K. Paliwal,et al.  A feature selection method using fixed-point algorithm for DNA microarray gene expression data , 2014, Int. J. Knowl. Based Intell. Eng. Syst..

[56]  Liang Goh,et al.  An Integrated Feature Selection and Classification Method to Select Minimum Number of Variables on the Case Study of Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[57]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[58]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[59]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[60]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[61]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[62]  Yi-Ching Hsieh,et al.  In chronic myeloid leukemia white cells from cytogenetic responders and non-responders to imatinib have very similar gene expression signatures. , 2005, Haematologica.

[63]  Sieu Phan,et al.  A Multi-Strategy Approach to Informative Gene Identification from Gene Expression Data , 2010, J. Bioinform. Comput. Biol..

[64]  Hans Georg Schaathun Machine Learning in Image Steganalysis: Schaathun/Machine Learning in Image Steganalysis , 2012 .

[65]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[66]  Gexiang Zhang,et al.  A novel genetic algorithm and its application to digital filter design , 2003, Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems.

[67]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[68]  Fillia Makedon,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004 .

[69]  Gexiang Zhang,et al.  Parameter Setting of Quantum-Inspired Genetic Algorithm Based on Real Observation , 2007, RSKT.

[70]  Chris H. Q. Ding,et al.  A Two-Stage Gene Selection Algorithm by Combining ReliefF and mRMR , 2007, BIBE.

[71]  Rajni Bala,et al.  A Hybrid Approach for Selection of Relevant Features for Microarray Datasets , 2007 .

[72]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[73]  Gil Alterovitz,et al.  Accelerating incremental wrapper based gene selection with K-Nearest-Neighbor , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[74]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[75]  Hala M. Alshamlan,et al.  The Performance of Bio-Inspired Evolutionary Gene Selection Methods for Cancer Classification Using Microarray Dataset , 2014 .

[76]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.