How to Tackle an Extremely Hard Learning Problem: Learning Causal Structures from Non-Experimental Data without the Faithfulness Assumption or the Like

Most methods for learning causal structures from non-experimental data rely on some assumptions of simplicity, the most famous of which is known as the Faithfulness condition. Without assuming such conditions to begin with, we develop a learning theory for inferring the structure of a causal Bayesian network, and we use the theory to provide a novel justification of a certain assumption of simplicity that is closely related to Faithfulness. Here is the idea. With only the Markov and IID assumptions, causal learning is notoriously too hard to achieve statistical consistency but we show that it can still achieve a quite desirable "combined" mode of stochastic convergence to the truth: having almost sure convergence to the true causal hypothesis with respect to almost all causal Bayesian networks, together with a certain kind of locally uniform convergence. Furthermore, every learning algorithm achieving at least that joint mode of convergence has this property: having stochastic convergence to the truth with respect to a causal Bayesian network $N$ only if $N$ satisfies a certain variant of Faithfulness, known as Pearl's Minimality condition---as if the learning algorithm were designed by assuming that condition. This explains, for the first time, why it is not merely optional but mandatory to assume the Minimality condition---or to proceed as if we assumed it.