Most methods for learning causal structures from non-experimental data rely on some assumptions of simplicity, the most famous of which is known as the Faithfulness condition. Without assuming such conditions to begin with, we develop a learning theory for inferring the structure of a causal Bayesian network, and we use the theory to provide a novel justification of a certain assumption of simplicity that is closely related to Faithfulness. Here is the idea. With only the Markov and IID assumptions, causal learning is notoriously too hard to achieve statistical consistency but we show that it can still achieve a quite desirable "combined" mode of stochastic convergence to the truth: having almost sure convergence to the true causal hypothesis with respect to almost all causal Bayesian networks, together with a certain kind of locally uniform convergence. Furthermore, every learning algorithm achieving at least that joint mode of convergence has this property: having stochastic convergence to the truth with respect to a causal Bayesian network $N$ only if $N$ satisfies a certain variant of Faithfulness, known as Pearl's Minimality condition---as if the learning algorithm were designed by assuming that condition. This explains, for the first time, why it is not merely optional but mandatory to assume the Minimality condition---or to proceed as if we assumed it.
[1]
Shai Ben-David,et al.
Understanding Machine Learning: From Theory to Algorithms
,
2014
.
[2]
J. Robins,et al.
Uniform consistency in causal inference
,
2003
.
[3]
Leslie G. Valiant,et al.
A theory of the learnable
,
1984,
CACM.
[4]
J. Pearl.
Causality: Models, Reasoning and Inference
,
2000
.
[5]
Jiji Zhang,et al.
A Comparison of Three Occam's Razors for Markovian Causal Models
,
2013,
The British Journal for the Philosophy of Science.
[6]
E. Mark Gold,et al.
Limiting recursion
,
1965,
Journal of Symbolic Logic.
[7]
P. Spirtes,et al.
Causation, prediction, and search
,
1993
.
[8]
Charles M. Grinstead,et al.
Introduction to probability
,
1986,
Statistics for the Behavioural Sciences.
[9]
László Györfi,et al.
Strongly consistent nonparametric tests of conditional independence
,
2012
.
[10]
Christopher Meek,et al.
Strong completeness and faithfulness in Bayesian networks
,
1995,
UAI.
[11]
Hilary Putnam,et al.
Trial and error predicates and the solution to a problem of Mostowski
,
1965,
Journal of Symbolic Logic.
[12]
William Feller,et al.
An Introduction to Probability Theory and Its Applications
,
1967
.