Bayes factors and the geometry of discrete hierarchical loglinear models

A standard tool for model selection in a Bayesian framework is the Bayes factor which compares the marginal likelihood of the data under two given different models. In this paper, we consider the class of hierarchical loglinear models for discrete data given under the form of a contingency table with multinomial sampling. We assume that the Diaconis-Ylvisaker conjugate prior is the prior distribution on the loglinear parameters and the uniform is the prior distribution on the space of models. Under these conditions, the Bayes factor between two models is a function of their prior and posterior normalizing constants. These constants are functions of the hyperparameters $(m,\alpha)$ which can be interpreted respectively as marginal counts and the total count of a fictive contingency table. We study the behaviour of the Bayes factor when $\alpha$ tends to zero. In this study two mathematical objects play a most important role. They are, first, the interior $C$ of the convex hull $\bar{C}$ of the support of the multinomial distribution for a given hierarchical loglinear model together with its faces and second, the characteristic function $\mathbb{J}_C$ of this convex set $C$. We show that, when $\alpha$ tends to 0, if the data lies on a face $F_i$ of $\bar{C_i},i=1,2$ of dimension $k_i$, the Bayes factor behaves like $\alpha^{k_1-k_2}$. This implies in particular that when the data is in $C_1$ and in $C_2$, i.e. when $k_i$ equals the dimension of model $J_i$, the sparser model is favored, thus confirming the idea of Bayesian regularization.

[1]  Nicholas Eriksson,et al.  Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models , 2006, J. Symb. Comput..

[2]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[3]  P. McMullen A COURSE IN CONVEXITY (Graduate Studies in Mathematics 54) By ALEXANDER BARVINOK: 366 pp., US$59.00, ISBN 0-8218-2968-8 (American Mathematical Society, Providence, RI, 2002) , 2003 .

[4]  P. Diaconis,et al.  Conjugate Priors for Exponential Families , 1979 .

[5]  P. Clarke,et al.  On maximum likelihood estimation for log-linear models with non-ignorable non-response , 2005 .

[6]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[7]  Seth Sullivant,et al.  Gröbner Bases and Polyhedral Geometry of Reducible and Cyclic Models , 2002, J. Comb. Theory, Ser. A.

[8]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[9]  Alexander Barvinok,et al.  A course in convexity , 2002, Graduate studies in mathematics.

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[12]  D. Geiger,et al.  On the toric algebra of graphical models , 2006, math/0608054.

[13]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[14]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[15]  Tommi S. Jaakkola,et al.  On the Dirichlet Prior and Bayesian Regularization , 2002, NIPS.

[16]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[17]  T. Speed,et al.  Additive and Multiplicative Models and Interactions , 1983 .

[18]  L. S. Pontryagin,et al.  General Topology I , 1990 .

[19]  Yu. V. Prokhorov Convergence of Random Processes and Limit Theorems in Probability Theory , 1956 .

[20]  H. Massam,et al.  A conjugate prior for discrete hierarchical log-linear models , 2006, 0711.1609.