Join Bayes Nets: A new type of Bayes net for relational data

Many real-world data are maintained in relational format, with different tables storing information about entities and their links or relationships. The structure (schema) of the database is essentially that of a logical language, with variables ranging over individual entities and predicates for relationships and attributes. Our work combines the graphical structure of Bayes nets with the logical structure of relational databases to achieve knowledge discovery in databases. We introduce Join Bayes nets, a new type of Bayes nets for representing and learning class-level dependencies between attributes from the same table and from different tables; such dependencies are important for policy making and strategic planning. Focusing on class-level dependencies brings advantages in terms of the simplicity of the model and the tractability of inference and learning. As usual with Bayes nets, the graphical structure supports efficient inference and reasoning. We show that applying standard Bayes net inference algorithms to the learned models provides fast and accurate probability estimates for queries that involve attributes and relationships from multiple tables.

[1]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[3]  Richard Scheines,et al.  TETRAD II : tools for causal modeling , 1994 .

[4]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[5]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[6]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[7]  Martin Ester,et al.  A Method for Multi-relational Classification Using Single and Multi-feature Aggregation Functions , 2007, PKDD.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[10]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[11]  Ben Taskar,et al.  Bayesian Logic Programming: Theory and Tool , 2007 .

[12]  J. D. Uiiman,et al.  Principles of Database Systems , 2004, PODS 2004.

[13]  Luc Dehaspe,et al.  Discovery of relational association rules , 2001 .

[14]  Luc De Raedt,et al.  How to Upgrade Propositional Learners to First Order Logic: A Case Study , 2001, Machine Learning and Its Applications.

[15]  Manfred Jaeger,et al.  Parameter learning for relational Bayesian networks , 2007, ICML '07.

[16]  L. Getoor,et al.  Logic-based Formalisms for Statistical Relational Learning , 2007 .

[17]  Ben Taskar,et al.  Inductive Logic Programming in a Nutshell , 2007 .

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[19]  Luc De Raedt,et al.  Bayesian Logic Programming: Theory and Tool , 2007 .

[20]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[21]  David Maxwell Chickering,et al.  Finding Optimal Bayesian Networks , 2002, UAI.

[22]  Ben Taskar,et al.  Probabilistic Relational Models , 2014, Encyclopedia of Social Network Analysis and Mining.

[23]  David Heckerman,et al.  Probabilistic Entity-Relationship Models, PRMs, and Plate Models , 2004 .

[24]  Joel Waldfogel,et al.  Introduction , 2010, Inf. Econ. Policy.

[25]  Ben Taskar,et al.  Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .

[26]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[27]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..