Data integration and visualization system for enabling conceptual biology

MOTIVATION Integration of heterogeneous data in life sciences is a growing and recognized challenge. The problem is not only to enable the study of such data within the context of a biological question but also more fundamentally, how to represent the available knowledge and make it accessible for mining. RESULTS Our integration approach is based on the premise that relationships between biological entities can be represented as a complex network. The context dependency is achieved by a judicious use of distance measures on these networks. The biological entities and the distances between them are mapped for the purpose of visualization into the lower dimensional space using the Sammon's mapping. The system implementation is based on a multi-tier architecture using a native XML database and a software tool for querying and visualizing complex biological networks. The functionality of our system is demonstrated with two examples: (1) A multiple pathway retrieval, in which, given a pathway name, the system finds all the relationships related to the query by checking available metabolic pathway, transcriptional, signaling, protein-protein interaction and ontology annotation resources and (2) A protein neighborhood search, in which given a protein name, the system finds all its connected entities within a specified depth. These two examples show that our system is able to conceptually traverse different databases to produce testable hypotheses and lead towards answers to complex biological questions.

[1]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[2]  Edgar Wingender,et al.  TRANSPATH: An integrated database on signal transduction and a tool for array analysis , 2003, Nucleic Acids Res..

[3]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[4]  Jason A. Papin,et al.  Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk. , 2004, Journal of theoretical biology.

[5]  Junguk Hur,et al.  A graph-theoretic modeling on GO space for biological interpretation of gene clusters , 2004, Bioinform..

[6]  Peter Gärdenfors,et al.  Conceptual spaces - the geometry of thought , 2000 .

[7]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[8]  Terence Critchlow,et al.  DataFoundry: information management for scientific data , 2000, IEEE Transactions on Information Technology in Biomedicine.

[9]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[10]  G. Blobel,et al.  Gene gating: a hypothesis. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David B. Searls,et al.  Data integration: challenges for drug discovery , 2005, Nature Reviews Drug Discovery.

[12]  Thorsten Fiebig,et al.  Software AG's Tamino XQuery Processor , 2004, XIME-P.

[13]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[14]  S. Chung,et al.  Kleisli: a new tool for data integration in biology. , 1999, Trends in biotechnology.

[15]  David M. Chao,et al.  A multisubunit complex associated with the RNA polymerase II CTD and TATA-binding protein in yeast , 1993, Cell.

[16]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[17]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[18]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[19]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[20]  Susumu Goto,et al.  LIGAND: database of chemical compounds and reactions in biological pathways , 2002, Nucleic Acids Res..

[21]  H. Herzel,et al.  Is there a bias in proteome research? , 2001, Genome research.

[22]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[23]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[24]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[25]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[27]  E. Aronson,et al.  Theory and method , 1985 .

[28]  G. Paravicini,et al.  PMI40, an intron-containing gene required for early steps in yeast mannosylation , 1992, Molecular and cellular biology.

[29]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[30]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[31]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[32]  Eric K. Neumann,et al.  Phenotype Characterisation Using Integrated Gene Transcript, Protein and Metabolite Profiling , 2004, Applied bioinformatics.

[33]  Gary D. Bader,et al.  SeqHound: biological sequence and structure database as a platform for bioinformatics research , 2002, BMC Bioinformatics.

[34]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[35]  Pamela A. Silver,et al.  Genome-Wide Localization of the Nuclear Transport Machinery Couples Transcriptional Status and Nuclear Organization , 2004, Cell.

[36]  Mikhail V. Blagosklonny,et al.  Conceptual biology: Unearthing the gems , 2002, Nature.

[37]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[38]  Lauri Seitsonen,et al.  Towards an integrative and context-sensitive approach to in silico disease modelling , 2005 .

[39]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[40]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[41]  Shahrokh Saeednia,et al.  How to maintain both privacy and authentication in digital libraries , 2000 .