On Consistency of Bayesian Inference with Mixtures of Logistic Regression

This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a mixtures-of-experts setup, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference conditional on the observed data can then be used for regression and classification. This letter gives conditions on choosing the number of experts (i.e., number of mixing components) k or choosing a prior distribution for k, so that Bayesian inference is consistent, in the sense of often approximating the underlying true relationship between y and x. The resulting classification rule is also consistent, in the sense of having near-optimal performance in classification. We show these desirable consistency properties with a nonstochastic k growing slowly with the sample size n of the observed data, or with a random k that takes large values with nonzero but small probabilities.

[1]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[2]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[3]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[4]  T. Choi Convergence of posterior distribution in the mixture of regressions , 2008 .

[5]  Wenxin Jiang,et al.  On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models , 1999, Neural Computation.

[6]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[7]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  Jiming Jiang,et al.  Conditional inference about generalized linear mixed models , 1999 .

[10]  Wenxin Jiang On the Consistency of Bayesian Variable Selection for High Dimensional Binary Regression and Classification , 2006, Neural Computation.

[11]  M. Tanner,et al.  Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum , 1999 .

[12]  Herbert K. H. Lee Consistency of posterior distributions for neural networks , 2000, Neural Networks.