Bayesian Network Learning with Parameter Constraints

The task of learning models for many real-world problems requires incorporating domain knowledge into learning algorithms, to enable accurate learning from a realistic volume of training data. This paper considers a variety of types of domain knowledge for constraining parameter estimates when learning Bayesian networks. In particular, we consider domain knowledge that constrains the values or relationships among subsets of parameters in a Bayesian network with known structure. We incorporate a wide variety of parameter constraints into learning procedures for Bayesian networks, by formulating this task as a constrained optimization problem. The assumptions made in module networks, dynamic Bayes nets and context specific independence models can be viewed as particular cases of such parameter constraints. We present closed form solutions or fast iterative algorithms for estimating parameters subject to several specific classes of parameter constraints, including equalities and inequalities among parameters, constraints on individual parameters, and constraints on sums and ratios of parameters, for discrete and continuous variables. Our methods cover learning from both frequentist and Bayesian points of view, from both complete and incomplete data. We present formal guarantees for our estimators, as well as methods for automatically learning useful parameter constraints from data. To validate our approach, we apply it to the domain of fMRI brain image analysis. Here we demonstrate the ability of our system to first learn useful relationships among parameters, and then to use them to constrain the training of the Bayesian network, resulting in improved cross-validated accuracy of the learned model. Experiments on synthetic data are also presented.

[1]  Thomas P. Minka,et al.  The Dirichlet-tree distribution , 2006 .

[2]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[3]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[4]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[5]  Indrayana Rustandi,et al.  Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data , 2005 .

[6]  Tom M. Mitchell,et al.  Learning to Decode Cognitive States from Brain Images , 2004, Machine Learning.

[7]  Peter Hooper Dependent Dirichlet Priors and Optimal Linear Estimators for Belief Net Parameters , 2004, UAI.

[8]  Tom M. Mitchell,et al.  Exploiting parameter domain knowledge for learning in bayesian networks , 2005 .

[9]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[10]  R. Bharat Rao,et al.  Clinical and financial outcomes analysis with existing hospital patient records , 2003, KDD '03.

[11]  Michael C. Horsch,et al.  Dynamic Bayesian networks , 1990 .

[12]  Tom M. Mitchell,et al.  Exploiting Parameter Related Domain Knowledge for Learning in Graphical Models , 2005, SDM.

[13]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[14]  A M Dale,et al.  Optimal experimental design for event‐related fMRI , 1999, Human brain mapping.

[15]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[16]  D. Geiger,et al.  A characterization of the Dirichlet distribution through global and local parameter independence , 1997 .

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[19]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[20]  William F. Eddy,et al.  Time Course of fMRI-Activation in Language and Spatial Networks during Sentence Comprehension , 1999, NeuroImage.

[21]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[22]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[23]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[24]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[25]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[26]  Pedro Larrañaga,et al.  Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction , 2002, Machine Learning.

[27]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[28]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[29]  Indrayana Rustandi,et al.  Hidden process models , 2006, ICML.