Compiling Bayesian Networks into Neural Networks

The criticism on the usage of Bayesian Networks in expert systems was centered around the claim that the use of probability requires a massive amount of data in the form of conditional probabilities. This paper shows that given information easily obtained from experts, the dependence model and some observations, the conditional probabilities can be estimated using backpropagation, such that during training the Bayesian characteristic of the network is preserved. Applying the Occam's razor principal results in defining a partial order among neural network structures. Experiments show that for the Multiplexer problem, the network compiled from the more succinct causal model generalized better than the one compiled from the less succinct model.