Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications

Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with distributions which may themselves introduce new parameters, and this construction recurses. In this review we discuss the role of hierarchical modeling in Bayesian nonparametrics, focusing on models in which the infinite-dimensional parameters are treated hierarchically. For example, we consider a model in which the base measure for a Dirichlet process is itself treated as a draw from another Dirichlet process. This yields a natural recursion that we refer to as a hierarchical Dirichlet process. We also discuss hierarchies based on the Pitman-Yor process and on completely random processes. We demonstrate the value of these hierarchical constructions in a wide range of practical applications, in problems in computational biology, computer vision and natural language processing.

[1]  J. Kingman,et al.  Completely random measures. , 1967 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[7]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[8]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  K. C. Chou,et al.  Multiscale recursive estimation, data fusion, and regularization , 1994, IEEE Trans. Autom. Control..

[11]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[12]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[13]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[14]  G. Tomlinson Analysis of densities , 1998 .

[15]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[16]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[17]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[18]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[19]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[20]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[21]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[22]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[23]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[24]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Philip J. Cowans Information retrieval using hierarchical dirichlet processes , 2004, SIGIR '04.

[27]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[28]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[29]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[30]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Thomas L. Griffiths,et al.  A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments , 2006, NIPS.

[32]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[33]  Wei Chu,et al.  Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model , 2005, Pacific Symposium on Biocomputing.

[34]  Padhraic Smyth,et al.  Hierarchical Dirichlet Processes with Random Effects , 2006, NIPS.

[35]  Yee Whye Teh,et al.  Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture , 2006, ICML.

[36]  Eric P. Xing,et al.  Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space , 2006, NIPS.

[37]  Carl E. Rasmussen,et al.  A choice model with infinitely many latent features , 2006, ICML.

[38]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[39]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[40]  Philip J. Cowans Probabilistic Document Modelling , 2006 .

[41]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[42]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[43]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[44]  Lancelot F. James,et al.  Coagulation fragmentation laws induced by general coagulations of two-parameter Poisson-Dirichlet processes , 2006, math/0601608.

[45]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[46]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[47]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[48]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[49]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[50]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[51]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[52]  Michael I. Jordan,et al.  Image Denoising with Nonparametric Hidden Markov Trees , 2007, 2007 IEEE International Conference on Image Processing.

[53]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[54]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[55]  Z. Ghahramani,et al.  Rejoinder for "Bayesian Nonparametric Latent Feature Models" , 2007 .

[56]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[57]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[58]  Michael I. Jordan,et al.  Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[59]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[60]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[61]  Steve Renals,et al.  Hierarchical Pitman-Yor language models for ASR in meetings , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[62]  Christopher D. Manning,et al.  The Infinite Tree , 2007, ACL.

[63]  Michael I. Jordan,et al.  An HDP-HMM for systems with state persistence , 2008, ICML '08.

[64]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[65]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.