Knowledge environments representing molecular entities for the virtual physiological human

In essence, the virtual physiological human (VPH) is a multiscale representation of human physiology spanning from the molecular level via cellular processes and multicellular organization of tissues to complex organ function. The different scales of the VPH deal with different entities, relationships and processes, and in consequence the models used to describe and simulate biological functions vary significantly. Here, we describe methods and strategies to generate knowledge environments representing molecular entities that can be used for modelling the molecular scale of the VPH. Our strategy to generate knowledge environments representing molecular entities is based on the combination of information extraction from scientific text and the integration of information from biomolecular databases. We introduce @neuLink, a first prototype of an automatically generated, disease-specific knowledge environment combining biomolecular, chemical, genetic and medical information. Finally, we provide a perspective for the future implementation and use of knowledge environments representing molecular entities for the VPH.

[1]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[2]  Baldomero Oliva,et al.  Detecting remotely related proteins by their interactions and sequence similarity. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[4]  Juliane Fluck,et al.  ProMiner: Recognition of Human Gene and Protein Names using regularly updated Dictionaries , 2007 .

[5]  Dietrich Rebholz-Schuhmann,et al.  Protein annotation by EBIMed , 2006, Nature Biotechnology.

[6]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[7]  Baldomero Oliva,et al.  PIANA: protein interactions and network analysis , 2006, Bioinform..

[8]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[9]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[10]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[11]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[12]  Baldomero Oliva,et al.  Structure-based evaluation of in silico predictions of protein-protein interactions using Comparative Docking , 2007, Bioinform..

[13]  À. Sierra,et al.  Overexpression of Bcl-xL in Human Breast Cancer Cells Enhances Organ-Selective Lymph Node Metastasis , 2004, Breast Cancer Research and Treatment.

[14]  Yang Jin,et al.  Automated recognition of malignancy mentions in biomedical literature , 2006, BMC Bioinformatics.

[15]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[16]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[17]  Juliane Fluck,et al.  Identification of new drug classification terms in textual resources , 2007, ISMB/ECCB.

[18]  Bertram Ludäscher,et al.  A knowledge environment for the biodiversity and ecological sciences , 2007, Journal of Intelligent Information Systems.

[19]  Peter J. F. Lucas,et al.  Bayesian Network Modelling by Qualitative Patterns , 2002, ECAI.

[20]  Baldomero Oliva,et al.  Predicting cancer involvement of genes from heterogeneous data , 2008, BMC Bioinformatics.

[21]  N. Gough Science's Signal Transduction Knowledge Environment , 2002 .

[22]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[23]  Marc A. Martí-Renom,et al.  Characterization of Protein Hubs by Inferring Interacting Motifs from Protein Interactions , 2007, PLoS Comput. Biol..

[24]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[25]  Ituro Inoue,et al.  The genetics of intracranial aneurysms , 2006, Journal of Human Genetics.

[26]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[27]  Naohiko Uramoto,et al.  A text-mining system for knowledge discovery from biomedical documents , 2004, IBM Syst. J..

[28]  Eric J Kunkel,et al.  Systems biology in drug discovery. , 2006, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.

[29]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[30]  N. Gough Science's signal transduction knowledge environment: the connections maps database. , 2002, Annals of the New York Academy of Sciences.

[31]  Laura Inés Furlong,et al.  Identifying gene-Specific Variations in Biomedical Text , 2007, J. Bioinform. Comput. Biol..

[32]  W. Boehncke,et al.  Experimental approaches to lymphocyte migration in dermatology in vitro and in vivo , 2005, Experimental dermatology.

[33]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[34]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[35]  Rebeca Sanz,et al.  Functional pathways shared by liver and lung metastases: a mitochondrial chaperone machine is up-regulated in soft-tissue breast cancer metastasis , 2007, Clinical & Experimental Metastasis.

[36]  Felix Franks,et al.  Characterization of Proteins , 1988, Humana Press.

[37]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[38]  Laura Inés Furlong,et al.  OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature , 2008, BMC Bioinformatics.

[39]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[41]  Matthias Heinemann,et al.  Synthetic biology - putting engineering into biology , 2006, Bioinform..

[42]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[43]  Nicola Guarino,et al.  The WonderWeb Library of Foundational Ontologies Preliminary Report , 2002 .

[44]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[45]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[46]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[47]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[48]  Jonathan D. Wren,et al.  A scalable machine-learning approach to recognize chemical names within large text databases , 2006, BMC Bioinformatics.