Building interpretable fuzzy models for high dimensional data analysis in cancer diagnosis

BackgroundAnalysing gene expression data from microarray technologies is a very important task in biology and medicine, and particularly in cancer diagnosis. Different from most other popular methods in high dimensional bio-medical data analysis, such as microarray gene expression or proteomics mass spectroscopy data analysis, fuzzy rule-based models can not only provide good classification results, but also easily be explained and interpreted in human understandable terms, by using fuzzy rules. However, the advantages offered by fuzzy-based techniques in microarray data analysis have not yet been fully explored in the literature. Although some recently developed fuzzy-based modeling approaches can provide satisfactory classification results, the rule bases generated by most of the reported fuzzy models for gene expression data are still too large to be easily comprehensible.ResultsIn this paper, we develop some Multi-Objective Evolutionary Algorithms based Interpretable Fuzzy (MOEAIF) methods for analysing high dimensional bio-medical data sets, such as microarray gene expression data and proteomics mass spectroscopy data. We mainly focus on evaluating our proposed models on microarray gene expression cancer data sets, i.e., the lung cancer data set and the colon cancer data set, but we extend our investigations to other type of cancer data set, such as the ovarian cancer data set. The experimental studies have shown that relatively simple and small fuzzy rule bases, with satisfactory classification performance, can be successfully obtained for challenging microarray gene expression datasets.ConclusionsWe believe that fuzzy-based techniques, and in particular the methods proposed in this paper, can be very useful tools in dealing with high dimensional cancer data. We also argue that the potential of applying fuzzy-based techniques to microarray data analysis need to be further explored.

[1]  Zhenyu Wang,et al.  A Comprehensive Fuzzy-Based Framework for Cancer Microarray Data Gene Expression Analysis , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[2]  P. Woolf,et al.  A fuzzy logic approach to analyzing gene expression data. , 2000, Physiological genomics.

[3]  Zhenyu Wang Fuzzy Gene Mining: A Fuzzy−based Framework for Cancer Microarray Data Analysis in Machine Learning in Bioinformatics‚ Y Zhang and J Rajapakse(Eds.) , 2008 .

[4]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[5]  Yuehui Chen,et al.  A novel ensemble of classifiers for microarray data classification , 2008, Appl. Soft Comput..

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Stephen F. Smith,et al.  Flexible Learning of Problem Solving Heuristics Through Adaptive Search , 1983, IJCAI.

[8]  Amit Bhaya,et al.  Evolving fuzzy rules to model gene expression , 2007, Biosyst..

[9]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[10]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[11]  Hisao Ishibuchi,et al.  Evolutionary multiobjective optimization and multiobjective fuzzy system design , 2008, CSTST.

[12]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[13]  Hisao Ishibuchi,et al.  Evolutionary Multiobjective Fuzzy System Design , 2008, BIONETICS.

[14]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[15]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[16]  Yong Xu,et al.  Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[17]  C. Windischberger,et al.  Quantification in functional magnetic resonance imaging: fuzzy clustering vs. correlation analysis. , 1998, Magnetic resonance imaging.

[18]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[19]  H. Ishibuchi,et al.  Distributed representation of fuzzy rules and its application to pattern classification , 1992 .

[20]  Ajith Abraham,et al.  Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology , 2008, CSTST 2008.

[21]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[22]  Chao Shi,et al.  Feature dimension reduction for microarray data analysis using locally linear embedding , 2005, APBC.

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  Rency S Varghese,et al.  Increasing the efficiency of fuzzy logic-based gene expression data analysis. , 2003, Physiological genomics.

[25]  Le Gruenwald,et al.  Microarray gene expression data association rules mining based on BSC-tree and FIS-tree , 2005, Data Knowl. Eng..

[26]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[27]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[28]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[29]  Hisao Ishibuchi,et al.  Hybridization of fuzzy GBML approaches for pattern classification problems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Gerald Schaefer,et al.  Fuzzy Classification for Gene Expression Data Analysis , 2008, Computational Intelligence in Bioinformatics.

[31]  Yuehui Chen,et al.  Computational Intelligence in Bioinformatics , 2008, Computational Intelligence in Bioinformatics.

[32]  Lucila Ohno-Machado,et al.  Small, fuzzy and interpretable gene expression based classifiers , 2005, Bioinform..

[33]  Sung-Bae Cho,et al.  Gene boosting for cancer classification based on gene expression profiles , 2009, Pattern Recognit..

[34]  S. Chao,et al.  FEATURE DIMENSION REDUCTION FOR MICROARRAY DATA ANALYSIS USING LOCALLY LINEAR EMBEDDING , 2005 .