DBrev: Dreaming of a Database Revolution

The database community has provided excellent frameworks for ecient querying and online transaction or analytical processing. The main assumption underlying most of these frameworks is that there is no uncertainty regarding the stored data. However, in recent years, many important applications have emerged that need to manage noisy, corrupted, or incomplete data. This includes, e.g., anonymized data, data derived from sensor systems, or data from information extraction and integration systems. For such applications the assumption of logical consistency may not be valid and needs to be revised. In particular, techniques like probabilistic modelling and statistical inference may be necessary to be able to draw meaningful conclusions from the underlying data. This paper presents DBrev, a hypothetical, intelligent database system for managing large quantities of data that involves uncertainty. We explain the main features of DBrev based on the scenario of information extraction and integration. We point out research challenges that need to be tackled and discuss a new set of assumptions that future database management frameworks need to build on.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Thore Graepel,et al.  Matchbox: Large Scale Bayesian Recommendations , 2009 .

[3]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[4]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[5]  Ofer Meshi,et al.  Template Based Inference in Symmetric Relational Markov Random Fields , 2007, UAI.

[6]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[7]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[8]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[9]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[10]  Paulo Cesar G. da Costa,et al.  A First-Order Bayesian Tool for Probabilistic Ontologies , 2008, FLAIRS Conference.

[11]  Gjergji Kasneci,et al.  Bayesian Knowledge Corroboration with Logical Rules and User Feedback , 2010, ECML/PKDD.

[12]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[13]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[14]  Oren Etzioni,et al.  Strategies for lifelong knowledge extraction from the web , 2007, K-CAP '07.

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[17]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[18]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  C. Koch,et al.  Worlds and Beyond : Effcient Representation and Processing of Incomplete Information , 2007 .

[20]  Thore Graepel,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Statistical Methods Matchbox: Large Scale Online Bayesian Recommendations , 2022 .

[21]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[22]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[23]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[24]  Lise Getoor Tutorial on Statistical Relational Learning , 2005, ILP.

[25]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[27]  Jean-François Condotta,et al.  Spatial and temporal reasoning: beyond Allen's calculus , 2004, AI Commun..

[28]  Gerhard Weikum,et al.  Database and information-retrieval methods for knowledge discovery , 2009, CACM.