Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

[1]  Carlo Vercellis,et al.  A comparative study of nonlinear manifold learning methods for cancer microarray data classification , 2013, Expert Syst. Appl..

[2]  José Luís Oliveira,et al.  geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification , 2014, BMC Bioinformatics.

[3]  T. Golub,et al.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. , 2004, Blood.

[4]  Jian Pei,et al.  A rank sum test method for informative gene discovery , 2004, KDD.

[5]  T. Aruldoss Albert Victoire,et al.  Design of fuzzy expert system for microarray data classification using a novel Genetic Swarm Algorithm , 2012, Expert Syst. Appl..

[6]  Zhenyu Wang Fuzzy Gene Mining: A Fuzzy−based Framework for Cancer Microarray Data Analysis in Machine Learning in Bioinformatics‚ Y Zhang and J Rajapakse(Eds.) , 2008 .

[7]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[8]  Bart Kosko,et al.  The shape of fuzzy sets in adaptive function approximation , 2001, IEEE Trans. Fuzzy Syst..

[9]  Thanh Nguyen,et al.  Constrained Fuzzy Hierarchical Analysis for Portfolio Selection Under Higher Moments , 2012, IEEE Transactions on Fuzzy Systems.

[10]  Yong Xu,et al.  Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[11]  S. Ramaswamy,et al.  Microarrays for an integrative genomics , 2004 .

[12]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[13]  M. Bohanec,et al.  The Analytic Hierarchy Process , 2004 .

[14]  Sung-Bae Cho,et al.  Gene boosting for cancer classification based on gene expression profiles , 2009, Pattern Recognit..

[15]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[16]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[17]  Antonio Ortega,et al.  Sequential diagonal linear discriminant analysis (SeqDLDA) for microarray classification and gene identification , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[18]  Jaewoo Kang,et al.  Improving Cancer Classification Accuracy Using Gene Pairs , 2010, PloS one.

[19]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[20]  Geoffrey J. McLachlan,et al.  On the classification of microarray gene-expression data , 2013, Briefings Bioinform..

[21]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[22]  Thomas L. Saaty,et al.  Group Decision Making: Drawing Out and Reconciling Differences , 2007 .

[23]  Alfredo Ferro,et al.  MIDClass: Microarray Data Classification by Association Rules and Gene Expression Intervals , 2013, PloS one.

[24]  Bart Kosko,et al.  Fuzzy function approximation with ellipsoidal rules , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Lei Liu,et al.  Ensemble gene selection for cancer classification , 2010, Pattern Recognit..

[26]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Alessandro Perina,et al.  Investigating Topic Models' Capabilities in Expression Microarray Data Classification , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  B. Kosko,et al.  What is the best shape for a fuzzy set in function approximation? , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[29]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[30]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[31]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[32]  Richard Simon,et al.  Microarray-based cancer prediction using single genes , 2011, BMC Bioinformatics.

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  Hong Yan,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Colin R. Reeves,et al.  Genetic Algorithms: Principles and Perspectives: A Guide to Ga Theory , 2002 .

[36]  Bart Kosko,et al.  Fuzzy Engineering , 1996 .

[37]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[38]  Bart Kosko,et al.  Fuzzy Systems as Universal Approximators , 1994, IEEE Trans. Computers.

[39]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[42]  Verónica Bolón-Canedo,et al.  An ensemble of filters and classifiers for microarray data classification , 2012, Pattern Recognit..

[43]  Muthu Subash Kavitha,et al.  Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data , 2016, PloS one.

[44]  V. Olman,et al.  A Comparative Analysis of Gene-Expression Data of Multiple Cancer Types , 2010, PloS one.

[45]  Perambur S. Neelakanta,et al.  DNA Microarray Data Classification via Haralick's Parameters , 2013 .

[46]  Carsten Peterson,et al.  Classification and diagnostic prediction of pediatric cancers using gene expression profiling and artificial neural networks , 2002 .