Learning Essential Graph Markov Models From Data

In a model selection procedure where many models are to be compared, computational efficiency is critical. For acyclic digraph (ADG) Markov models (aka DAG models or Bayesian networks), each ADG Markov equivalence class can be represented by a unique chain graph, called an essential graph (EG). This parsimonious representation might be used to facilitate selection among ADG models. Because EGs combine features of decomposable graphs and ADGs, a scoring metric can be developed for EGs with categorical (multinomial) data. This metric may permit the characterization of local computations directly for EGs, which in turn would yield a learning procedure that does not require transformation to representative ADGs at each step for scoring purposes, nor is the scoring metric constrained by Markov equivalence.

[1]  Guido Consonni,et al.  Compatible prior distributions for directed acyclic graph models , 2004 .

[2]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[3]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[4]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[5]  Alberto Roverato,et al.  Comaptible Prior Distributions for DAG Models , 2001 .

[6]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[7]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[8]  Robert Castelo,et al.  Towards an inclusion driven learning of Bayesian Networks , 2002 .

[9]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[10]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[11]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[12]  Michael I. Jordan Graphical Models , 2003 .

[13]  Steffen L. Lauritzen,et al.  Compatible prior distributions , 2000 .

[14]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[15]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[16]  Milan Studený,et al.  On characterizing Inclusion of Bayesian Networks , 2001, UAI.

[17]  Enrique F. Castillo,et al.  Learning and Updating of Uncertainty in Dirichlet Models , 2004, Machine Learning.

[18]  D. Madigan,et al.  Alternative Markov Properties for Chain Graphs , 2001 .

[19]  Robert Castelo,et al.  On Inclusion-Driven Learning of Bayesian Networks , 2003, J. Mach. Learn. Res..

[20]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[21]  Robert Castelo,et al.  Improved learning of Bayesian networks , 2001, UAI.

[22]  M. Degroot Optimal Statistical Decisions , 1970 .

[23]  Louis Wehenkel,et al.  On the Construction of the Inclusion Boundary Neighbourhood for Markov Equivalence Classes of Bayesian Network Structures , 2002, UAI.

[24]  M. Frydenberg The chain graph Markov property , 1990 .