The nested Chinese restaurant process and hierarchical topic models

We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[3]  Samuel Kotz,et al.  Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory , 1978, The Mathematical Gazette.

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  D. Aldous Exchangeability and related topics , 1985 .

[6]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[7]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[10]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[11]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[12]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[13]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[14]  Thomas Hofmann,et al.  The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data , 1999, IJCAI.

[15]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[19]  Stuart J. Russell,et al.  Approximate inference for first-order probabilistic languages , 2001, IJCAI.

[20]  Mihaela Enachescu,et al.  Variations on Random Graph Models for the Web , 2001 .

[21]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[23]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[26]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[27]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[28]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[29]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Stuart J. Russell,et al.  Approximate Inference for Infinite Contingent Bayesian Networks , 2005, AISTATS.

[31]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  B. Schölkopf,et al.  Edinburgh Research Explorer Interpolating between types and tokens by estimating power-law generators , 2006 .

[33]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[34]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[35]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[36]  Max Welling,et al.  Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[37]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[38]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[39]  Roded Sharan,et al.  Bayesian Haplotype Inference via the Dirichlet Process , 2007, J. Comput. Biol..

[40]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[41]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[42]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[43]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[44]  David Poole,et al.  Logical Generative Models for Probabilistic Reasoning about Existence, Roles and Identity , 2007, AAAI.