The BAY-HIST Prediction Model for RDF Documents

In real-world RDF documents, property subject and object values are often correlated. The identification of these relationships is of significant relevance to many applications, e.g., query evaluation planning and linking analysis. In this paper we present the BAY-HIST Prediction Model, a combination of Bayesian networks and multidimensional histograms which is able to identify the probability of these dependencies. In general, Bayesian networks assume a small number of discrete values for each of the variables considered in the network. However, in the context of the Semantic Web, variables that represent the concepts in large-sized RDF documents may contain a very large number of values; thus, BAY-HIST implements multidimensional histograms in order to aggregate the data associated with each node in the network. We illustrate the benefits of applying BAY-HIST to the problem of query selectivity estimation as part of costbased query optimization. We report initial experimental results on the predictive capability of this model and the effectiveness of our optimization techniques when used together with BAY-HIST. The results suggest that the quality of the optimal evaluation plan has improved over the plan identified by existing cost models that assume independence and uniform distribution of the data values.

[1]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[2]  Paulo Cesar G. da Costa,et al.  PR-OWL: A Bayesian Ontology Language for the Semantic Web , 2005, ISWC-URSW.

[3]  J. Calmet,et al.  OntoBayes: An Ontology-Driven Uncertainty Model , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[4]  Yun Peng,et al.  A Bayesian Approach to Uncertainty Modelling in OWL Ontology , 2006 .

[5]  Amadis Antonio Martinez Morales A Directed Hypergraph Model for RDF , 2007, KWEPSY.

[6]  Maria-Esther Vidal,et al.  Query evaluation and optimization in the semantic web , 2008, Theory Pract. Log. Program..

[7]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[8]  Edna Ruckhaus,et al.  OneQL: An Ontology-based Architecture to E ffi ciently Query Resources on the Semantic Web , 2009 .

[9]  Maria-Esther Vidal,et al.  Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[10]  Lise Getoor,et al.  Learning statistical models from relational data , 2011, SIGMOD '11.