Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations

Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

[1]  D. Clemmons,et al.  Extracellular matrix contains insulin-like growth factor binding protein-5: potentiation of the effects of IGF-I , 1993, The Journal of cell biology.

[2]  D. Rosskopf,et al.  Membrane sodium-proton exchange and primary hypertension. , 1993, Hypertension.

[3]  Elena P. Sapozhnikova,et al.  Mining Rare Associations between Biological Ontologies , 2014, PloS one.

[4]  M. Saier Families of transmembrane sugar transport proteins , 2000, Molecular microbiology.

[5]  D. Wolf,et al.  Proteasomes: destruction as a programme. , 1996, Trends in biochemical sciences.

[6]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..

[7]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[8]  Mario Albrecht,et al.  Mining GO Annotations for Improving Annotation Consistency , 2012, PloS one.

[9]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[10]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[11]  Susan M. Bridges,et al.  Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships , 2013, J. Biomed. Informatics.

[12]  B. Alberts,et al.  Molecular Biology of the Cell 4th edition , 2007 .

[13]  David Sánchez,et al.  A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain , 2014, J. Biomed. Informatics.

[14]  A. Goldberg,et al.  Proteolysis, proteasomes and antigen presentation , 1992, Nature.

[15]  J. A. Dani,et al.  Ion-channel entrances influence permeation. Net charge, size, shape, and binding considerations. , 1986, Biophysical journal.

[16]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[17]  M D Snider,et al.  Topography of glycosylation in the rough endoplasmic reticulum and Golgi apparatus. , 1987, Annual review of biochemistry.

[18]  H. Lester,et al.  The inward rectifier potassium channel family , 1995, Current Opinion in Neurobiology.

[19]  Mario Cannataro,et al.  Improving annotation quality in gene ontology by mining cross-ontology weighted association rules , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[20]  Susan M. Bridges,et al.  Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology , 2012, PloS one.

[21]  C. Cantor,et al.  Microtubule assembly in the absence of added nucleotides. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Mario Cannataro,et al.  Data mining and life sciences applications on the grid , 2013, WIREs Data Mining Knowl. Discov..

[23]  F. Quiocho,et al.  A tweezers-like motion of the ATP-binding cassette dimer in an ABC transport cycle. , 2003, Molecular cell.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[26]  J. Boyle Molecular biology of the cell, 5th edition by B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walter , 2008 .

[27]  D. Leroith,et al.  The role of the growth hormone/insulin-like growth factor axis in tumor growth and progression: Lessons from animal models. , 2005, Cytokine & growth factor reviews.

[28]  R. Stuart,et al.  The N-terminal Membrane Anchor Region : Importance of Saccharomyces Cerevisiae -atp Synthase of the Yeast O F 1 Functional Analysis of Subunit E of the F , 2004 .

[29]  C. Supuran,et al.  Bacterial protease inhibitors , 2002, Medicinal research reviews.

[30]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[31]  B. Turner,et al.  Histone H4 isoforms acetylated at specific lysine residues define individual chromosomes and chromatin domains in Drosophila polytene nuclei , 1992, Cell.

[32]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .