Discovery of Causal Rules Using Partial Association

Discovering causal relationships in large databases of observational data is challenging. The pioneering work in this area was rooted in the theory of Bayesian network (BN) learning, which however, is a NP-complete problem. Hence several constraint-based algorithms have been developed to efficiently discover causations in large databases. These methods usually use the idea of BN learning, directly or indirectly, and are focused on causal relationships with single cause variables. In this paper, we propose an approach to mine causal rules in large databases of binary variables. Our method expands the scope of causality discovery to causal relationships with multiple cause variables, and we utilise partial association tests to exclude noncausal associations, to ensure the high reliability of discovered causal rules. Furthermore an efficient algorithm is designed for the tests in large databases. We assess the method with a set of real-world diagnostic data. The results show that our method can effectively discover interesting causal rules in large databases.

[1]  Prakash P. Shenoy,et al.  A Bayesian network approach to making inferences in causal maps , 2001, Eur. J. Oper. Res..

[2]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[3]  I. J. Good,et al.  A Theory of Causality , 1959 .

[4]  Didier Dubois,et al.  Mathematical models for handling partial knowledge in artificial intelligence , 1995 .

[5]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.

[6]  L. Elton,et al.  THE DIRECTION OF TIME , 1978 .

[7]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[8]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[9]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[11]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[12]  Gregory F. Cooper,et al.  A Theoretical Study of Y Structures for Causal Discovery , 2006, UAI.

[13]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[14]  Judea Pearl,et al.  From Bayesian networks to causal networks , 1995 .

[15]  Hans Reinchenbach The Principle of Causality and the Possibility of its Empirical Confirmation , 1978 .

[16]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[18]  M. W. Birch The Detection of Partial Association, I: The 2 × 2 Case , 1964 .

[19]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[20]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[21]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[22]  Laura Martignon,et al.  Bayesian network models of causal learning , 1998 .

[23]  M. Waldmann,et al.  A Bayesian Network Model of Causal Learning , 1999 .

[24]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..