Model Formulation: semCDI: A Query Formulation for Semantic Data Integration in caBIG

OBJECTIVES To develop mechanisms to formulate queries over the semantic representation of cancer-related data services available through the cancer Biomedical Informatics Grid (caBIG). DESIGN The semCDI query formulation uses a view of caBIG semantic concepts, metadata, and data as an ontology, and defines a methodology to specify queries using the SPARQL query language, extended with Horn rules. semCDI enables the joining of data that represent different concepts through associations modeled as object properties, and the merging of data representing the same concept in different sources through Common Data Elements (CDE) modeled as datatype properties, using Horn rules to specify additional semantics indicating conditions for merging data. Validation In order to validate this formulation, a prototype has been constructed, and two queries have been executed against currently available caBIG data services. DISCUSSION The semCDI query formulation uses the rich semantic metadata available in caBIG to build queries and integrate data from multiple sources. Its promise will be further enhanced as more data services are registered in caBIG, and as more linkages can be achieved between the knowledge contained within caBIG's NCI Thesaurus and the data contained in the Data Services. CONCLUSION semCDI provides a formulation for the creation of queries on the semantic representation of caBIG. This constitutes the foundation to build a semantic data integration system for more efficient and effective querying and exploratory searching of cancer-related data.

[1]  Matthias Lange,et al.  SEMEDA: ontology based semantic integration of biological databases , 2003, Bioinform..

[2]  Michael Schroeder,et al.  Editorial - Semantic Web for Life Sciences , 2006, J. Web Semant..

[3]  Deborah L. McGuinness,et al.  Owl web ontology language guide , 2003 .

[4]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[5]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..

[6]  Alon Y. Halevy,et al.  Data integration and genomic medicine , 2007, J. Biomed. Informatics.

[7]  Joel H. Saltz,et al.  An XML-based System for Synthesis of Data from Disparate Databases , 2006, Journal of the American Medical Informatics Association.

[8]  Cathy H. Wu,et al.  Update on human genome completion and annotations: Protein information resource , 2004 .

[9]  Cathy H. Wu,et al.  Update on genome completion and annotations: Protein Information Resource , 2004, Human Genomics.

[10]  Sean Martin,et al.  Advancing Cancer Systems Biology: Introducing the Center for the Development of a Virtual Tumor, CViT , 2007, Cancer informatics.

[11]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[12]  James Lyons-Weiler Biomarker Development Study Publication Standards are Dead—Long Live Biomarker Development Study Publication Standards! , 2007, Cancer informatics.

[13]  Eric K. Neumann,et al.  A Life Science Semantic Web: Are We There Yet? , 2005, Science's STKE.

[14]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[15]  RIF RDF and OWL Compatibility W3C 3 , 2022 .

[16]  Mansur R. Kabuka,et al.  ASMOV : Ontology Alignment with Semantic Validation , 2007 .

[17]  H Billhardt,et al.  An agent- and ontology-based system for integrating public gene, protein, and disease databases , 2007, J. Biomed. Informatics.

[18]  Rebecca S. Crowley,et al.  The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid , 2006, BMC Medical Informatics Decis. Mak..

[19]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[20]  Xiaoshu Wang,et al.  From XML to RDF: how semantic web technologies will change the design of 'omic' standards , 2005, Nature Biotechnology.

[21]  Eric K. Neumann,et al.  Pacific Symposium on Biocomputing 11:176-187(2006) BIODASH: A SEMANTIC WEB DASHBOARD FOR DRUG DEVELOPMENT , 2022 .

[22]  Jennifer Golbeck,et al.  Modeling a description logic vocabulary for cancer research , 2005, J. Biomed. Informatics.

[23]  Joel H. Saltz,et al.  caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid , 2006, Bioinform..

[24]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[25]  Thomas S Deisboeck,et al.  The effects of EGF-receptor density on multiscale tumor growth patterns. , 2005, Journal of theoretical biology.

[26]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[27]  James F. Brinkley,et al.  BioMediator Data Integration: Beyond Genomics to Neuroscience Data , 2005, AMIA.