Probabilistic relational model benchmark generation: Principle and application

The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic relational models (PRMs). These latter extend Bayesian networks (BNs) to a relational data mining context that enable effective and robust reasoning about relational data structures. Even though a panoply of works have focused, separately, on Bayesian networks and relational databases random generation, no work has been identified for PRMs on that track. This paper provides an algorithmic approach allowing to generate random PRMs from scratch to cover the absence of generation process. The proposed method allows to generate PRMs as well as synthetic relational data from a randomly generated relational schema and a random set of probabilistic dependencies. This can be of interest for machine learning researchers to evaluate their proposals in a common framework, as for databases designers to evaluate the effectiveness of the components of a database management system.

[1]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[2]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[3]  Francesco Archetti,et al.  Probabilistic Relational Models with Relational Uncertainty: An Early Study in Web Page Classification , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[4]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[5]  Mathias Ekstedt,et al.  A probabilistic relational model for security risk analysis , 2010, Comput. Secur..

[6]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[7]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[8]  Daphne Koller,et al.  Probabilistic reasoning for complex systems , 1999 .

[9]  Olivier François,et al.  Generation of Incompliete Test-Data usinng Bayesinan Networks , 2007, 2007 International Joint Conference on Neural Networks.

[10]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[11]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.

[12]  Josep-Lluís Larriba-Pey,et al.  Benchmarking database systems for social network applications , 2013, GRADES.

[13]  David J. DeWitt,et al.  Duplicate record elimination in large data files , 1983, TODS.

[14]  Fabio Gagliardi Cozman,et al.  Random Generation of Bayesian Networks , 2002, SBIA.

[15]  Christophe Gonzales,et al.  Reinforcing the Object-Oriented Aspect of Probabilistic Relational Models , 2010 .

[16]  Fabio Gagliardi Cozman,et al.  Generating Random Bayesian Networks with Constraints on Induced Width , 2004, ECAI.

[17]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[18]  Brian J. Taylor,et al.  Learning Causal Models of Relational Domains , 2010, AAAI.

[19]  Katerina Marazopoulou,et al.  A Sound and Complete Algorithm for Learning Causal Models from Relational Data , 2013, UAI.

[20]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[21]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[22]  Philippe Leray,et al.  Random Generation and Population of Probabilistic Relational Models and Databases , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[23]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[24]  Edward A. Bender,et al.  The asymptotic number of acyclic digraphs, II , 1988, J. Comb. Theory, Ser. B.

[25]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[26]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[27]  Ben Taskar,et al.  Probabilistic Relational Models , 2014, Encyclopedia of Social Network Analysis and Mining.

[28]  C. J. Date The relational database dictionary , 2008 .

[29]  Pierre-Henri Wuillemin,et al.  Structured probabilistic inference , 2012, Int. J. Approx. Reason..

[30]  Lise Getoor,et al.  Learning statistical models from relational data , 2011, SIGMOD '11.

[31]  Rajani Chulyadyo,et al.  A Personalized Recommender System from Probabilistic Relational Model and Users' Preferences , 2014, KES.

[32]  Luc De Raedt,et al.  Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract) , 1998, ILP.

[33]  Katerina Marazopoulou,et al.  Reasoning about Independence in Probabilistic Models of Relational Data , 2013, ArXiv.

[34]  Carlo Curino,et al.  Benchmarking OLTP/web databases in the cloud: the OLTP-bench framework , 2012, CloudDB '12.