论文信息 - Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets

Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets

We argue that causality is a useful, if not a necessary concept to allow the integrative analysis of multiple data sources. Specifically, we show that it enables learning causal relations from (a) data obtained over different experimental conditions, (b) data over different variable sets, and (c) data over semantically similar variables that nevertheless cannot be pulled together for various technical reasons. The latter case particularly, often occurs in the setting of analyzing multiple gene-expression datasets. For cases (a) and (b) above there already exist preliminary algorithms that address them, albeit with some limitations, while for case (c) we develop and evaluate a new method. Preliminary empirical results provide evidence of increased learning performance of causal relations when multiple sources are combined using our method versus learning from each individual dataset. In the context of the above discussion we introduce the problem of Multi-Source Causal Analysis (MSCA), defined as the problem of inferring and inducing causal knowledge from multiple sources of data and knowledge. The grand vision of MSCA is to enable the automated or semi-automated, large-scale integration of available data to construct causal models involving a significant part of human concepts.

Ioannis Tsamardinos | Asimakis P. Mariglis | I. Tsamardinos | A. Mariglis

[1] Rich Caruana,et al. Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[2] Rainer Breitling,et al. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[3] Constantin F. Aliferis,et al. The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[4] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .

[5] David Danks,et al. Integrating Locally Learned Causal Structures with Overlapping Variables , 2008, NIPS.

[6] P. Spirtes,et al. Causation, prediction, and search , 1993 .

[7] Gregory F. Cooper,et al. The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[8] Richard Scheines,et al. Causation, Prediction, and Search, Second Edition , 2000, Adaptive computation and machine learning.

[9] Gregory F. Cooper,et al. Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.