Aligning Ontologies and Integrating Textual Evidence for Pathway Analysis of Microarray Data

Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

[1]  C. Harrington,et al.  For Personal Use. Only Reproduce with Permission from the Lancet , 2022 .

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[4]  Russ B. Altman,et al.  Including Biological Literature Improves Homology Search , 2001, Pacific Symposium on Biocomputing.

[5]  Daniel E. Zak,et al.  PAINT: a promoter analysis and interaction network generation tool for gene regulatory network identification. , 2003, Omics : a journal of integrative biology.

[6]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[7]  Olivier Bodenreider,et al.  Non-Lexical Approaches to Identifying Associative Relations in the Gene Ontology , 2004, Pacific Symposium on Biocomputing.

[8]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[9]  Christian Posse,et al.  Cross-Ontological Analytics: Combining Associative and Hierarchical Relations in the Gene Ontologies to Assess Gene Product Similarity , 2006, International Conference on Computational Science.

[10]  M A Andrade,et al.  Position-specific annotation of protein function based on multiple homologs. , 1999, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[11]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[12]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[13]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[14]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[15]  James Pustejovsky,et al.  Rerendering Semantic Ontologies: Automatic Extensions to UMLS through Corpus Analytics , 2002, ArXiv.

[16]  Olivier Bodenreider,et al.  Ontology-driven similarity approaches to supporting gene func- tional assessment , 2005 .

[17]  Zhiyong Lu,et al.  GO Molecular Function Terms Are Predictive of Subcellular Localization , 2004, Pacific Symposium on Biocomputing.