Towards Ad-Hoc Rule Semantics for Gene Expression Data

The notion of rules is very popular and appears in different flavors, for example as association rules in data mining or as functional (or multivalued) dependencies in databases. Their syntax is the same but their semantics widely differs. In this article, we focus on semantics for which Armstrong's axioms are sound and complete. In this setting, we propose a unifying framework in which any ”well-formed” semantics for rules may be integrated. We do not focus on the underlying data mining problems posed by the discovery of rules, rather we prefer to emphasize the expressiveness of our contribution in a particular domain of application: the understanding of gene regulatory networks from gene expression data. The key idea is that biologists have the opportunity to choose – among some predefined semantics – or to define the meaning of their rules which best fits into their requirements. Our proposition has been implemented and integrated into an existing open-source system named MeV of the TIGR environment devoted to microarray data interpretation.

[1]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[2]  Jean-Marc Petit,et al.  e-functional dependency inference: application to DNA microarray expression data , 2001, BDA.

[3]  Jean-Marc Petit,et al.  Functional and approximate dependency mining: database and FCA points of view , 2002, J. Exp. Theor. Artif. Intell..

[4]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[5]  Georg Gottlob,et al.  Investigations on Armstrong relations, dependency inference, and excluded functional dependencies , 1990, Acta Cybern..

[6]  Vincent Duquenne,et al.  Familles minimales d'implications informatives résultant d'un tableau de données binaires , 1986 .

[7]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[8]  Jean-Marc Petit,et al.  Functional and approximate dependencies mining: databases and FCA point of view , 2002 .

[9]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[10]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[11]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[12]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[13]  Jean-Marc Petit,et al.  Vers différents types de règles pour les données d'expression de gènes - application à des données de tumeurs mammaires , 2004, INFORSID.

[14]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[15]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[16]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[17]  Carolina Ruiz,et al.  Distance-enhanced association rules for gene expression , 2003, BIOKDD.

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  János Demetrovics,et al.  Some Remarks On Generating Armstrong And Inferring Functional Dependencies Relation , 1995, Acta Cybern..

[20]  W. W. Armstrong,et al.  Dependency Structures of Data Base Relationships , 1974, IFIP Congress.

[21]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[22]  Philip A. Bernstein,et al.  Computational problems related to the design of normal form relational schemas , 1979, TODS.

[23]  David Maier Minimum covers in the relational database model (Extended Abstract) , 1979, STOC '79.

[24]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[25]  R. Somogyi,et al.  The application of shannon entropy in the identification of putative drug targets. , 2000, Bio Systems.

[26]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.