Learning Genetic and Gene Bayesian Networks with Hidden Variables: Bilayer Verification Algorithm

To improve the recovery of gene-gene and marker-gene (eQTL) interaction networks from microarray and genetic data, we propose a new procedure for learning Bayesian networks. This algorithm, termed Bilayer Verification, starts with a user-specified leaf node, and then searches upstream to locate portions of the biological interaction network that can be verified as unconfounded by hidden variables such as protein levels. We provide theoretical justification for this procedure, which learns Bayesian networks by recursively finding two levels of v-structures in the data. We discuss the specialization and efficiencies gained when exogenous variables (those with no parents) such as genetic markers can be included in the network.