Semantic Web Approach to Database Integration in the Life Sciences

This chapter describes the challenges involved in the integration of databases storing diverse but related types of life sciences data. A major challenge in this regard is the syntactic and semantic heterogeneity of life sciences databases. There is a strong need for standardizing the syntactic and semantic data representations. We discuss how to address this by using the emerging Semantic Web technologies based on the Resource Description Framework (RDF) standard. This chapter presents two use cases, namely YeastHub and LinkHub, which demonstrate how to use the latest RDF database technology to build data warehouses that facilitate integration of genomic/proteomic data and identifiers.

[1]  Gregory R. Grant,et al.  RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies , 2004, Bioinform..

[2]  Nichole L. King,et al.  The PeptideAtlas Project , 2010, Proteome Bioinformatics.

[3]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[4]  Kei-Hoi Cheung,et al.  The TRIPLES database: a community resource for yeast molecular biology , 2002, Nucleic Acids Res..

[5]  Perry L. Miller,et al.  Model Formulation: QIS: A Framework for Biomedical Database Federation , 2004, J. Am. Medical Informatics Assoc..

[6]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[7]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[8]  Volker Haarslev,et al.  Querying the Semantic Web with Racer + nRQL , 2004 .

[9]  K K Kidd,et al.  PhenoDB: an integrated client/server database for linkage and population genetics. , 1996, Computers and biomedical research, an international journal.

[10]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[11]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[12]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[13]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[14]  Aubrey E. Hill,et al.  The UAB Proteomics Database , 2003, Bioinform..

[15]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): updates and enhancements , 2005, Nucleic Acids Res..

[16]  Tim Berners-Lee,et al.  The World-Wide Web , 1994, CACM.

[17]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[18]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[19]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[20]  Terence Critchlow,et al.  DataFoundry: information management for scientific data , 2000, IEEE Transactions on Information Technology in Biomedicine.

[21]  Joanne S. Luciano,et al.  PAX of mind for pathway researchers. , 2005, Drug discovery today.

[22]  Xiaoshu Wang,et al.  From XML to RDF: how semantic web technologies will change the design of 'omic' standards , 2005, Nature Biotechnology.

[23]  Golan Yona,et al.  BIOZON: a hub of heterogeneous biological data , 2006, Nucleic Acids Res..

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[26]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[27]  Xiao Su,et al.  Semantic web infrastructure for fungal enzyme biotechnologists , 2006, J. Web Semant..

[28]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[29]  Eric K. Neumann,et al.  Pacific Symposium on Biocomputing 11:176-187(2006) BIODASH: A SEMANTIC WEB DASHBOARD FOR DRUG DEVELOPMENT , 2022 .

[30]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[31]  Susie Stephens,et al.  Applying semantic Web technologies to drug safety determination , 2006, IEEE Intelligent Systems.

[32]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[33]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..

[34]  Kei-Hoi Cheung,et al.  YeastHub: a semantic web use case for integrating data in the life sciences domain , 2005, ISMB.

[35]  Jill Duncan,et al.  Analyzing microarray data using cluster analysis. , 2003, Pharmacogenomics.

[36]  M. Daly,et al.  A genetic linkage map of the human genome , 1987, Cell.

[37]  Christine Golbreich,et al.  The Foundational Model of Anatomy in OWL: Experience and Perspectives , 2006, OWLED.

[38]  Huajun Chen,et al.  RDF/RDFS-based Relational Database Integration , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[40]  L Wong,et al.  Development of software tools at BioInformatics Centre (BIC) at the National University of Singapore (NUS). , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[41]  Renée J. Miller,et al.  Schema equivalence in heterogeneous systems: bridging theory and practice , 1994, Inf. Syst..

[42]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[43]  C R Cantor,et al.  Orchestrating the Human Genome Project. , 1990, Science.