Discovering Causal Rules in Relational Databases

This article explores the combined application of inductive learning algorithms and causal inference techniques to the problem of discovering causal rules among the attributes of a relational database. Given some relational data each field can be considered as a random variable and a hybrid graph can be built by detecting conditional independencies among variables. The induced graph represents genuine and potential causal relations as well as spurious associations. When the variables are discrete or have been discretized to test condi tional independencies supervised induction algorithms can be used to learn causal rules that is conditional statements in which causes appear as antecedents and effects as consequences. The approach is illustrated by means of some experiments conducted on different data sets.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Bo Thiesson,et al.  Selecting Models from Data : AI and statistics IV , 1995 .

[4]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[5]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[6]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[7]  Luc De Raedt,et al.  Machine Learning: ECML-94 , 1994, Lecture Notes in Computer Science.

[8]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[10]  Eric R. Ziecel Selecting Models From Data , 1995 .

[11]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[12]  Donato Malerba,et al.  Discovering Probabilistic Causal Relationships: A Comparison Between Two Methods , 1994 .

[13]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[14]  P. Spirtes,et al.  Causality From Probability , 1989 .

[15]  J. Gibbons Nonparametric measures of association , 1993 .

[16]  Donato Malerba,et al.  An Analytic and Empirical Comparison of Two Methods for Discovering Probabilistic Causal Relationships , 1994, ECML.

[17]  Donato Malerba,et al.  A Multistrategy Approach to Learning Multiple Dependent Concepts , 1996 .

[18]  Douglas W. Nychka,et al.  Discovering Causal Structure , 1989 .

[19]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[20]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[21]  Jan M. Zytkow,et al.  Interactive Mining of Regularities in Databases , 1991, Knowledge Discovery in Databases.