Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network

MOTIVATION There is a general scientific need to be able to identify and evaluate what any given set of 'objects' (e.g. genes, phenotypes, chemicals, diseases) has in common. Whether it is to classify, expand upon or identify commonalities and functional groupings, informational needs can be diverse and the best source to identify relationships among a potentially heterogeneous set of objects is the scientific literature. RESULTS We first establish a network of related objects by their co-occurrence within MEDLINE records. A set of objects within this network can then be queried to identify shared relationships, and a method is presented to score their statistical relevance by comparing observed frequencies with what would be expected in a random network model. Using Gene Ontology (GO) categories, we demonstrate that this method enables a quantitative ranking of the 'cohesiveness' of a set of objects and, importantly, allows other objects related to this set to be identified and evaluated for their 'cohesion' to it. Supplemental information: A list of ranked genes related to each GO category analyzed can be found at http://innovation.swmed.edu/IRIDESCENT/GO_relationships.htm

[1]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[3]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[4]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[5]  P. Warren,et al.  Gene expression microarrays and the integration of biological knowledge. , 2001, Trends in biotechnology.

[6]  R. Altman,et al.  Using text analysis to identify functionally coherent gene groups. , 2002, Genome research.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[9]  Jonathan D Wren,et al.  Cross-hybridization on PCR-spotted microarrays. , 2002, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[10]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[11]  Russ B. Altman,et al.  A literature-based method for assessing the functional coherence of a gene group , 2003, Bioinform..

[12]  Y. Ohkubo,et al.  Coordinate expression of Fgf8, Otx2, Bmp4, and Shh in the rostral prosencephalon during development of the telencephalic and optic vesicles , 2001, Neuroscience.

[13]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[14]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  Alexander Pertsemlidis,et al.  ARROGANT: an application to manipulate large gene collections , 2002, Bioinform..

[16]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[17]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[18]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[19]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[20]  Harvey B Sarnat,et al.  Agenesis of the Mesencephalon and Metencephalon with Cerebellar Hypoplasia: Putative Mutation in the EN2 Gene—Report of 2 Cases in Early Infancy , 2002, Pediatric and developmental pathology : the official journal of the Society for Pediatric Pathology and the Paediatric Pathology Society.

[21]  Mathew W. Wright,et al.  The HUGO Gene Nomenclature Committee (HGNC) , 2001, Human Genetics.

[22]  Y Yang,et al.  An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts , 1996, Comput. Biol. Medicine.

[23]  Paola Bovolenta,et al.  Sonic hedgehog in CNS development: one signal, multiple outputs , 2002, Trends in Neurosciences.

[24]  H R Garner,et al.  Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition Dictionaries , 2002, Methods of Information in Medicine.