Using Annotations from Controlled Vocabularies to Find Meaningful Associations

This paper presents the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multiple repositories, and to identify meaningful associations. Consider a physical link between objects in two repositories, where each of the objects is annotated with controlled vocabulary (CV) terms from two ontologies. Using a set of LSLink instances generated from a background dataset of knowledge we identify associations between pairs of CV terms that are potentially significant and may lead to new knowledge. We develop an approach based on the logarithm of the odds (LOD) to determine a confidence and support in the associations between pairs of CV terms. Using a case study of Entrez Gene objects annotated with GO terms linked to PubMed objects annotated with MeSH terms, we describe a user validation and analysis task to explore potentially significant associations.

[1]  Bernardo A Mangiola,et al.  A Drosophila protein-interaction map centered on cell-cycle regulators , 2004, Genome Biology.

[2]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[3]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[4]  Goran Nenadic,et al.  Mining protein function from text using term-based support vector machines , 2005, BMC Bioinformatics.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Steven Salzberg,et al.  Efficient decoding algorithms for generalized hidden Markov model gene finders , 2005, BMC Bioinformatics.

[7]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[8]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[9]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[10]  Peer Bork,et al.  Systematic Association of Genes to Phenotypes by Genome and Literature Mining , 2005, PLoS biology.

[11]  Oscar Moya Mesa,et al.  Bioinformatics in phylogeography: analitical methods and applications , 2005, BMC Bioinformatics.

[12]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[13]  Andrew C. R. Martin PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt , 2004, Bioinform..

[14]  Mir S. Siadaty,et al.  Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method , 2006, BMC Medical Informatics Decis. Mak..

[15]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[16]  P J Kersey,et al.  Integr8: Enhanced Inter-Operability of European Molecular Biology Databases , 2003, Methods of Information in Medicine.

[17]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[18]  Padmini Srinivasan,et al.  Mining MEDLINE for implicit links between dietary substances and diseases , 2004, ISMB/ECCB.

[19]  Eric K. Neumann,et al.  Pacific Symposium on Biocomputing 11:176-187(2006) BIODASH: A SEMANTIC WEB DASHBOARD FOR DRUG DEVELOPMENT , 2022 .

[20]  M. Kanehisa,et al.  DBGET/LinkDB: an integrated database retrieval system. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[21]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[22]  Ming-Qing Du,et al.  Identification of novel prognostic markers in cervical intraepithelial neoplasia using LDMAS (LOH Data Management and Analysis Software) , 2005, BMC Bioinformatics.

[23]  Mark Craven,et al.  Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text , 2005, BMC Bioinformatics.

[24]  Mário J. Silva,et al.  Finding genomic ontology terms in text using evidence content , 2005, BMC Bioinformatics.

[25]  Qing Zhang,et al.  The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications , 2005, BMC Bioinformatics.

[26]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[27]  Halil Kilicoglu,et al.  Integrating a Hypernymic Proposition Interpreter into a Semantic Processor for Biomedical Texts , 2003, AMIA.

[28]  R. Altman,et al.  PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. , 2005, Methods in molecular biology.

[29]  Rolf Apweiler,et al.  GOAnnotator: linking protein GO annotations to evidence text , 2006, Journal of biomedical discovery and collaboration.

[30]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[31]  Toshihisa Takagi,et al.  Knowledge discovery based on an implicit and explicit conceptual network , 2007, J. Assoc. Inf. Sci. Technol..

[32]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.