From Observational Studies to Causal Rule Mining

Randomised controlled trials (RCTs) are the most effective approach to causal discovery, but in many circumstances it is impossible to conduct RCTs. Therefore, observational studies based on passively observed data are widely accepted as an alternative to RCTs. However, in observational studies, prior knowledge is required to generate the hypotheses about the cause-effect relationships to be tested, and hence they can only be applied to problems with available domain knowledge and a handful of variables. In practice, many datasets are of high dimensionality, which leaves observational studies out of the opportunities for causal discovery from such a wealth of data sources. In another direction, many efficient data mining methods have been developed to identify associations among variables in large datasets. The problem is that causal relationships imply associations, but the reverse is not always true. However, we can see the synergy between the two paradigms here. Specifically, association rule mining can be used to deal with the high-dimensionality problem, whereas observational studies can be utilised to eliminate noncausal associations. In this article, we propose the concept of causal rules (CRs) and develop an algorithm for mining CRs in large datasets. We use the idea of retrospective cohort studies to detect CRs based on the results of association rule mining. Experiments with both synthetic and real-world datasets have demonstrated the effectiveness and efficiency of CR mining. In comparison with the commonly used causal discovery methods, the proposed approach generally is faster and has better or competitive performance in finding correct or sensible causes. It is also capable of finding a cause consisting of multiple variables—a feature that other causal discovery methods do not possess.

[1]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[2]  J. Concato,et al.  Randomized, controlled trials, observational studies, and the hierarchy of research designs. , 2000, The New England journal of medicine.

[3]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[4]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[7]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[8]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[9]  Gregory F. Cooper,et al.  A Theoretical Study of Y Structures for Causal Discovery , 2006, UAI.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[12]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[13]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[14]  George Hripcsak,et al.  Methodological Review: A review of causal inference for biomedical informatics , 2011 .

[15]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[16]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[17]  Geoffrey I. Webb Layered critical values: a powerful direct-adjustment approach to discovering significant patterns , 2008, Machine Learning.

[18]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[19]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[20]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[21]  Jae-Wook Song,et al.  Observational Studies: Cohort and Case-Control Studies , 2010, Plastic and reconstructive surgery.

[22]  G. Norman,et al.  Randomized controlled trials. , 2004, AJR. American journal of roentgenology.

[23]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[24]  Carmine Zoccali,et al.  Cohort Studies: Prospective versus Retrospective , 2009, Nephron Clinical Practice.

[25]  S. Kruger Design Of Observational Studies , 2016 .

[26]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[27]  Jiuyong Li,et al.  Mining Causal Association Rules , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[28]  Peter Cummings,et al.  Observational studies in radiology. , 2004, AJR. American journal of roentgenology.

[29]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[30]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[31]  Jiuyong Li,et al.  On optimal rule discovery , 2006, IEEE Transactions on Knowledge and Data Engineering.

[32]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[33]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[34]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[35]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[36]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[37]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[38]  Isabelle Guyon,et al.  Causality : Objectives and Assessment , 2010 .

[39]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.