Small, fuzzy and interpretable gene expression based classifiers

MOTIVATION Interpretation of classification models derived from gene-expression data is usually not simple, yet it is an important aspect in the analytical process. We investigate the performance of small rule-based classifiers based on fuzzy logic in five datasets that are different in size, laboratory origin and biomedical domain. RESULTS The classifiers resulted in rules that can be readily examined by biomedical researchers. The fuzzy-logic-based classifiers compare favorably with logistic regression in all datasets. AVAILABILITY Prototype available upon request.

[1]  Griffin M. Weber,et al.  Classification of gene expression data using fuzzy logic , 2002, J. Intell. Fuzzy Syst..

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Szymon Jaroszewicz,et al.  The Goodman-Kruskal coefficient and its applications in genetic diagnosis of cancer , 2004, IEEE Transactions on Biomedical Engineering.

[4]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[5]  Thomas Ågotnes,et al.  Filtering Large Propositional Rule Sets While Retaining Classifier Performance , 1999 .

[6]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  P. Woolf,et al.  A fuzzy logic approach to analyzing gene expression data. , 2000, Physiological genomics.

[9]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[10]  Jan Komorowski,et al.  Learning Rule-based Models of Biological Process from Gene Expression Time Profiles Using Gene Ontology , 2003, Bioinform..

[11]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[12]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[13]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[14]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[15]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[16]  C. Brown,et al.  Determination of X-chromosome inactivation status using X-linked expressed polymorphisms identified by database searching. , 2000, Genomics.

[17]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[18]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[19]  Lawrence Hunter,et al.  GEST: a gene expression search tool based on a novel Bayesian similarity metric , 2001, ISMB.

[20]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[21]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[23]  Sung-Bae Cho,et al.  A Rule-based Integration of Neural Network Modules based on Cellular Automata for Sensory-Motor Controller , 2002 .

[24]  Rency S Varghese,et al.  Increasing the efficiency of fuzzy logic-based gene expression data analysis. , 2003, Physiological genomics.

[25]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[26]  A. Brazma,et al.  Towards reconstruction of gene networks from expression data by supervised learning , 2003, Genome Biology.

[27]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[28]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[29]  Carsten Peterson,et al.  Random Boolean network models and the yeast transcriptional network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[31]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[32]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.