Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables

While there has been considerable research in learning probabilistic graphical models from data for predictive and causal inference, almost all existing algorithms assume a single dataset of i.i.d. observations for all variables. For many applications, it may be impossible or impractical to obtain such datasets, but multiple datasets of i.i.d. observations for different subsets of these variables may be available. Tillman et al. [2009] showed how directed graphical models learned from such datasets can be integrated to construct an equivalence class of structures over all variables. While their procedure is correct, it assumes that the structures integrated do not entail contradictory conditional independences and dependences for variables in their intersections. While this assumption is reasonable asymptotically, it rarely holds in practice with finite samples due to the frequency of statistical errors. We propose a new correct procedure for learning such equivalence classes directly from the multiple datasets which avoids this problem and is thus more practically useful. Empirical results indicate our method is not only more accurate, but also faster and requires less memory.

[1]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[2]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[3]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[4]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[5]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[6]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[7]  R. Cudeck An estimate of the covariance between variables which are not jointly observed , 2000 .

[8]  Melantjong Random Generation Of Dags For Graph Drawing , 2000 .

[9]  Peter Spirtes,et al.  An Anytime Algorithm for Causal Inference , 2001, AISTATS.

[10]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[11]  C. Varin,et al.  A note on composite likelihood inference and model selection , 2005 .

[12]  D. Danks Scientific Coherence and the Fusion of Experimental Results , 2005, The British Journal for the Philosophy of Science.

[13]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[14]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[15]  Jiji Zhang,et al.  Causal Inference and Reasoning in Causally Insu-cient Systems , 2006 .

[16]  James Cussens,et al.  Bayesian network learning by compiling to weighted MAX-SAT , 2008, UAI.

[17]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[18]  Jiji Zhang,et al.  Causal Reasoning with Ancestral Graphs , 2008, J. Mach. Learn. Res..

[19]  David Danks,et al.  Integrating Locally Learned Causal Structures with Overlapping Variables , 2008, NIPS.

[20]  Robert E. Tillman,et al.  Structure learning with independent non-identically distributed data , 2009, ICML '09.

[21]  Russell A. Poldrack,et al.  Six problems for causal inference from fMRI , 2010, NeuroImage.

[22]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.