An ontology-centric architecture for extensible scientific data management systems

Data management has become a critical challenge faced by a wide array of scientific disciplines in which the provision of sound data management is pivotal to the achievements and impact of research projects. Massive and rapidly expanding amounts of data combined with data models that evolve over time contribute to making data management an increasingly challenging task that warrants a new approach. In this paper we present an ontology-centric architecture for data management systems that is extensible and domain independent. In this architecture, the behaviors of domain concepts and objects are captured entirely by ontological entities, around which all data management tasks are carried out. The open and semantic nature of ontology languages also makes this architecture amenable to greater data reuse and interoperability. To evaluate the proposed architecture, we have applied it to the challenge of managing phenomics data. Highlights? A novel, extensible ontology-centric architecture for scientific data management. ? We present the formal definitions of ontology versioning and dynamic composition. ? We demonstrate the feasibility of the architecture through an implemented system.

[1]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[2]  Reagan Moore,et al.  The SDSC storage resource broker , 2010, CASCON.

[3]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[4]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[5]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[6]  Hua Wang,et al.  A family of enhanced (L, alpha)-diversity models for privacy preserving data publishing , 2011, Future Gener. Comput. Syst..

[7]  Salwani Abdullah,et al.  Great Deluge Algorithm for Rough Set Attribute Reduction , 2010, FGIT-DTA/BSBT.

[8]  Domenico Talia,et al.  Future Generation Computer Systems a Framework for Distributed Knowledge Management: Design and Implementation , 2022 .

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Jane Hunter,et al.  PODD: An Ontology-Driven Data Repository for Collaborative Phenomics Research , 2010, ICADL.

[11]  Alexander Borgida,et al.  Distributed Description Logics: Directed Domain Correspondences in Federated Information Sources , 2002, OTM.

[12]  John Newsome Crossley,et al.  Tiered Logic for Agents , 2009, ICAART.

[13]  Nigel W. Hardy,et al.  The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics , 2007, Nature Biotechnology.

[14]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[15]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[16]  Martin Eisenacher,et al.  Modeling and managing experimental data using FuGE. , 2009, Omics : a journal of integrative biology.

[17]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[18]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[19]  Lewis Y. Geer,et al.  Database resources of the National Center for Biotechnology Information , 2014, Nucleic Acids Res..

[20]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[21]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  John Newsome Crossley,et al.  Tiered Logic for Agents in Contexts , 2009, ICAART.

[24]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[25]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[26]  Antonio F. Gómez-Skarmeta,et al.  Semantic-based authorization architecture for Grid , 2011, Future Gener. Comput. Syst..

[27]  Jane Hunter,et al.  PODD - Towards an Extensible, Domain-Agnostic Scientific Data Management System , 2010, 2010 IEEE Sixth International Conference on e-Science.

[28]  Hua Wang,et al.  A Family of Enhanced ( L , α )-Diversity Models For Privacy Preserving Data Publishing , 2010 .

[29]  Carole A. Goble,et al.  An overview of S-OGSA: A Reference Semantic Grid Architecture , 2006, J. Web Semant..

[30]  Shoaib Sufi,et al.  CCLRC Scientific Metadata Model: Version 2 , 2004 .

[31]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[32]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[33]  J DeWittDavid,et al.  Scientific data management in the coming decade , 2005 .

[34]  Alan Ruttenberg,et al.  Life sciences on the Semantic Web: the Neurocommons and beyond , 2009, Briefings Bioinform..

[35]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[36]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[37]  Mudita Singhal,et al.  Enabling high-throughput data management for systems biology: The Bioinformatics Resource Manager , 2007, Bioinform..

[38]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[39]  Sean Bechhofer,et al.  The OWL API: A Java API for Working with OWL 2 Ontologies , 2009, OWLED.

[40]  Ian A. Mason,et al.  Metamathematics of Contexts , 1995, Fundam. Informaticae.

[41]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[42]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[43]  Ellen J. Cramer,et al.  VIVO: Enabling National Networking of Scientists , 2010, IASSIST.

[44]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..