Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology

The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

[1]  Robert Hoehndorf,et al.  Statistical Tests for Associations between Two Directed Acyclic Graphs , 2010, PloS one.

[2]  Mark A. Ragan,et al.  Automatic, context-specific generation of Gene Ontology slims , 2010, BMC Bioinformatics.

[3]  Hui Wang,et al.  AgBase: supporting functional modeling in agricultural organisms , 2010, Nucleic Acids Res..

[4]  Jano I. van Hemert,et al.  Mining Spatial Gene Expression Data for Association Rules , 2007, BIRD.

[5]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[6]  Torulf Mollestad,et al.  Additional Gene Ontology structure for improved biological reasoning , 2006, Bioinform..

[7]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .

[8]  Vincent S. Tseng,et al.  Efficient mining of multilevel gene association rules from microarray and gene ontology , 2009, Inf. Syst. Frontiers.

[9]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[10]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[11]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[12]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[13]  J. Mosser,et al.  DEPENDENCE RELATIONS IN GENE ONTOLOGY: A PRELIMINARY STUDY , 2004 .

[14]  Christian Borgelt,et al.  Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[15]  Sushil Jajodia,et al.  Proceedings of the 1993 ACM SIGMOD international conference on Management of data , 1993, SIGMOD 1993.

[16]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[17]  Xiaodan Zhang,et al.  Mining Biomedical Knowledge Using Chi-Square Association Rule , 2010, 2010 IEEE International Conference on Granular Computing.

[18]  G. Alterovitz,et al.  An Information Theoretic Framework for Ontology-based Bioinformatics , 2007, 2007 Information Theory and Applications Workshop.

[19]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[20]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[21]  Gil Alterovitz,et al.  GO PaD: the Gene Ontology Partition Database , 2006, Nucleic Acids Res..

[22]  Cynthia L. Smith,et al.  Integrating phenotype ontologies across multiple species , 2010, Genome Biology.

[23]  Jiawei Han,et al.  Mining knowledge at multiple concept levels , 1995, CIKM '95.

[24]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[25]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[26]  Wen-Chi Hou,et al.  An Empirical Study of Qualities of Association Rules from a Statistical View Point , 2008, J. Inf. Process. Syst..

[27]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[28]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[29]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.