Query evaluation with soft-key constraints

Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constraint by penalizing every violating world by a quantity proportional to the violation. To represent our probabilistic database with constraints, we define a class of markov networks, where we can do query evaluation in PTIME. We also study the evaluation of conjunctive queries on relations with soft keys and present a dichotomy that separates this set into those in PTIME and the rest which are #P-Hard.

[1]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[2]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[3]  Fereidoon Sadri,et al.  Integrity Constraints in the Information Source Tracking Method , 1995, IEEE Trans. Knowl. Data Eng..

[4]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[5]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Rahul Gupta,et al.  Efficient inference with cardinality-based clique potentials , 2007, ICML '07.

[7]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[9]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[11]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[12]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[15]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[16]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[17]  Xin Li,et al.  Constraint-Based Entity Matching , 2005, AAAI.

[18]  Norbert Fuhr,et al.  A probabilistic relational model for the integration of IR and databases , 1993, SIGIR.

[19]  Lise Getoor,et al.  An Introduction to Probabilistic Graphical Models for Relational Data , 2006, IEEE Data Eng. Bull..

[20]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Renée J. Miller,et al.  First-order query rewriting for inconsistent databases , 2005, J. Comput. Syst. Sci..

[22]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[23]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[25]  Jan Chomicki,et al.  Preference-Driven Querying of Inconsistent Relational Databases , 2006, EDBT Workshops.

[26]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[27]  Jennifer Widom,et al.  An Introduction to ULDBs and the Trio System , 2006, IEEE Data Eng. Bull..