S3QL: A distributed domain specific language for controlled semantic integration of life sciences data

BackgroundThe value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.ResultsReflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.ConclusionsS3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

[1]  Adrian Paschke,et al.  A journey to Semantic Web query federation in the life sciences , 2009, BMC Bioinformatics.

[2]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[3]  References , 1971 .

[4]  Jonas S. Almeida,et al.  EURISWEB – Web-based epidemiological surveillance of antibiotic-resistant pneumococci in Day Care Centers , 2003, BMC Medical Informatics Decis. Mak..

[5]  Christoph Bussler Is Semantic Web Technology Taking the Wrong Turn? , 2008, IEEE Internet Computing.

[6]  M. Scott Marshall,et al.  Provenance of Microarray Experiments for a Better Understanding of Experiment Results , 2010, SWPM@ISWC.

[7]  Mark D. Wilkinson,et al.  Moby and Moby 2: Creatures of the Deep (Web) , 2009, Briefings Bioinform..

[8]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[9]  Roberta B Carey,et al.  Invasive methicillin-resistant Staphylococcus aureus infections in the United States. , 2007, JAMA.

[10]  Bryn Nelson Data sharing: Empty archives , 2009, Nature.

[11]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[12]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[13]  BMC Bioinformatics , 2005 .

[14]  Carlos Alberto Heuser,et al.  Integrating Biological Databases , 2003, SBBD.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  John N. Weinstein,et al.  Exposing the cancer genome atlas as a SPARQL endpoint , 2010, J. Biomed. Informatics.

[17]  James A. Hendler,et al.  A Framework for Web Science , 2006, Found. Trends Web Sci..

[18]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[19]  James A. Hendler,et al.  A Framework for Web Science (Foundations and Trends(R) in Web Science) , 2006 .

[20]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[21]  Helena F. Deus,et al.  A Semantic Web Management Model for Integrative Biomedical Informatics , 2008, PloS one.

[22]  James A. Hendler,et al.  From the Semantic Web to social machines: A research challenge for AI on the World Wide Web , 2010, Artif. Intell..

[23]  Yuki Hayashi,et al.  AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services , 2010, BMC Medical Informatics Decis. Mak..

[24]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[25]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[26]  Gail Hodge,et al.  Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files , 2000 .

[27]  Scott Klasky,et al.  Scientific Process Automation and Workflow Management , 2009, Scientific Data Management.

[28]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[29]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[30]  Helena F. Deus,et al.  Exploratory Analysis of the Copy Number Alterations in Glioblastoma Multiforme , 2008, PloS one.

[31]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[32]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[33]  Wendy Hall,et al.  Creating a Science of the Web , 2006, Science.

[34]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[35]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[36]  Helena F. Deus,et al.  Data integration gets 'Sloppy' , 2006, Nature Biotechnology.

[37]  A. Chiang,et al.  Data‐Driven Methods to Discover Molecular Determinants of Serious Adverse Drug Events , 2009, Clinical pharmacology and therapeutics.

[38]  G. Ippolito,et al.  Methicillin-resistant Staphylococcus aureus: the superbug. , 2010, International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases.

[39]  Alexandre P. Francisco,et al.  Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach , 2009, BMC Bioinformatics.

[40]  Navin Rajpal,et al.  Physical Data Warehouse Design Using Neural Network , 2010 .

[41]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[42]  Wolfgang Maass,et al.  S3DB core: a framework for RDF generation and management in bioinformatics infrastructures , 2010, BMC Bioinformatics.

[43]  Eran Hammer-Lahav,et al.  The OAuth 1.0 Protocol , 2010, RFC.

[44]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[45]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[46]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[47]  Helena F Deus,et al.  Adapting experimental ontologies for molecular epidemiology. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[48]  Stephen H Koslow,et al.  Sharing primary data: a threat or asset to discovery? , 2002, Nature Reviews Neuroscience.

[49]  Jonas S. Almeida,et al.  AGML Central: web based gel proteomic infrastructure , 2005, Bioinform..