Hierarchical Models , Nested Models and Completely Random Measures

Statistics has both optimistic and pessimistic faces, with the Bayesian perspective often associated with the former and the frequentist perspective with the latter, but with foundational thinkers such as Jim Berger reminding us that statistics is fundamentally a Janus-like creature with two faces. In creating one field out of two perspectives, one of the unifying ideas emphasized by Berger and others is the Bayesian hierarchy, a modeling framework that simultaneously allows complex models to be created and tames their behavior. Another general tool for creating complex models while controlling their complexity is by nesting simplified models inside of more complex models, an appeal to the principle of “divide-and-conquer.” An example is the classical finite mixture model, where each data point is modeled as arising from a single mixture component. Note that this appeal to divideand-conquer is quite different from the recursive principle underlying hierarchical modeling— the latter strategy provides a way to share statistical strength among components while the former strategy tends to isolate components. Of course, many complex models involve a blend of these strategies. If the need to exploit hierarchical and nested structures is compelling in parametric models, it is still more compelling in Bayesian nonparametrics, where the growth in numbers of degrees of freedom creates significant challenges in controlling model complexity. The basic idea of Bayesian nonparametrics is to replace classical finite-dimensional prior distributions with general stochastic processes, thereby allowing an open-ended number of degrees of freedom in a model. The framework expresses an essential optimism—only an optimist could hope to fit a model involving an infinite number of degrees of freedom based on finite data. But it also expresses the pessimism that simplified parametric models may be inadequate to

[1]  J. Kingman,et al.  Completely random measures. , 1967 .

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  D. Aldous Exchangeability and related topics , 1985 .

[5]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[6]  Yongdai Kim NONPARAMETRIC BAYESIAN ESTIMATORS FOR COUNTING PROCESSES , 1999 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Philip J. Cowans Information retrieval using hierarchical dirichlet processes , 2004, SIGIR '04.

[9]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[10]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Thomas L. Griffiths,et al.  The nested Chinese restaurant process and Bayesian inference of topic hierarchies , 2007 .

[13]  Roded Sharan,et al.  Bayesian Haplotype Inference via the Dirichlet Process , 2007, J. Comput. Biol..

[14]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[15]  Michael I. Jordan,et al.  Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Thomas L. Griffiths,et al.  The nested Chinese restaurant process and hierarchical topic models , 2007 .

[17]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[18]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[19]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.