Ontology databases

On the one hand, ontologies provide a means of formally specifying complex descriptions and relationships about information in a way that is expressive yet amenable to automated processing and reasoning. When data are annotated using terms from an ontology, the instances inhere in formal semantics. Compared to an ontology, which may have as few as a dozen or as many as tens of thousands of terms, the annotated instances for the ontology are often several orders of magnitude larger, from millions to possibly trillions of instances. Unfortunately, existing reasoning techniques cannot scale to these sizes. On the other hand, relational database management systems provide mechanisms for storing, retrieving, and maintaining the integrity of large amounts of data. Relational database management systems are well known for scaling to extremely large sizes of data, some claiming to manage over a quadrillion data. This dissertation defines ontology databases as a mapping from ontologies to relational databases in order to combine the expressiveness of ontologies with the scalability of relational databases. This mapping is sound and, under certain conditions, complete. That is, the database behaves like a knowledge base which is faithful to the semantics of a given ontology. What distinguishes this work is the treatment of the relational database management system as an active reasoning component rather than as a passive storage and retrieval system. The main contributions this dissertation will highlight include: (i) the theory and implementation particulars for mapping ontologies to databases, (ii) subsumption based reasoning, (iii) inconsistency detection, (iv) scalability studies, and (v) information integration (specifically, information exchange). This work is novel because it is the first attempt to embed a logical reasoning system, specified by a Semantic Web ontology, into a plain relational database management system using active database technologies. This work also introduces the not-gadget, which relaxes the closed-world assumption and increases the expressive power of the logical system without significant cost. This work also demonstrates how to deploy the same framework as an information integration system for data exchange scenarios, which is an important step toward semantic information integration over distributed data repositories.

[1]  Olivier Curé,et al.  A Database Trigger Strategy to Maintain Knowledge Bases Developed Via Data Migration , 2005, EPIA.

[2]  Allen D. Malony,et al.  Development of NeuroElectroMagnetic ontologies(NEMO): a framework for mining brainwave ontologies , 2007, KDD '07.

[3]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[4]  Olivier Bodenreider,et al.  Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies , 2007, Artif. Intell. Medicine.

[5]  Dejing Dou,et al.  Detecting Inconsistencies in the Gene Ontology Using Ontology Databases with Not-gadgets , 2009, OTM Conferences.

[6]  Olegas Vasilecas,et al.  An algorithm for the automatic transformation of ontology axioms into a rule model , 2007, CompSysTech '07.

[7]  I. Horrocks,et al.  The Instance Store: DL Reasoning with Large Numbers of Individuals , 2004, Description Logics.

[8]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[9]  G. Frishkoff Hemispheric differences in strong versus weak semantic priming: Evidence from event-related brain potentials , 2007, Brain and Language.

[10]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[11]  Vassilis Christophides,et al.  Optimizing taxonomic semantic web queries using labeling schemes , 2004, J. Web Semant..

[12]  Jeff Heflin,et al.  DLDB: Extending Relational Databases to Support Semantic Web Queries , 2003, PSSS.

[13]  Volker Haarslev,et al.  High Performance Reasoning with Very Large Knowledge Bases: A Practical Case Study , 2000, IJCAI.

[14]  Judith A. Blake,et al.  Gene Ontology annotations: what they mean and where they come from , 2008, BMC Bioinformatics.

[15]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[16]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[17]  Jack Minker,et al.  Logic and Databases: A 20 Year Retrospective , 1996, Logic in Databases.

[18]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[19]  Jennifer Widom,et al.  Managing Semantic Heterogeneity with Production Rules and Persistent Queues , 1993, VLDB.

[20]  Roger King,et al.  Semantic database modeling: survey, applications, and research issues , 1987, CSUR.

[21]  Stanley Y. W. Su,et al.  Active Data/Knowledge Bases Research At the University of Florida , 1992, IEEE Data Eng. Bull..

[22]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[23]  Dejing Dou,et al.  Discovering Executable Semantic Mappings Between Ontologies , 2007, OTM Conferences.

[24]  John K. Slaney,et al.  Relevant Logic and Paraconsistency , 2005, Inconsistency Tolerance.

[25]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[26]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[27]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[28]  Diego Calvanese,et al.  DL-Lite: Tractable Description Logics for Ontologies , 2005, AAAI.

[29]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[30]  Jeff Heflin,et al.  An Evaluation of Knowledge Base Systems for Large OWL Datasets , 2004, SEMWEB.

[31]  Irving L. Traiger,et al.  A history and evaluation of System R , 1981, CACM.

[32]  Jennifer Widom,et al.  Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[33]  Raymond Reiter,et al.  Towards a Logical Reconstruction of Relational Database Theory , 1982, On Conceptual Modelling.

[34]  Raymond Reiter What Should a Database Know? , 1992, J. Log. Program..

[35]  Alejandro P. Buchmann,et al.  REACH: a REal-time, ACtive and Heterogeneous mediator system , 1992, IEEE Data Eng. Bull..

[36]  Raymond Reiter On Closed World Data Bases , 1977, Logic and Data Bases.

[37]  Peishen Qi,et al.  Ontology Translation on the Semantic Web , 2003, OTM.

[38]  Francesco M. Donini,et al.  Description logics of minimal knowledge and negation as failure , 2002, TOCL.

[39]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[40]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.