The nested Chinese restaurant process and Bayesian inference of topic hierarchies

We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Samuel Kotz,et al.  Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory , 1978, The Mathematical Gazette.

[5]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  D. Aldous Exchangeability and related topics , 1985 .

[7]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[8]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[11]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[12]  Jayaran Sethuramant A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[13]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[14]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[15]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[16]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[17]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[18]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[19]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[20]  Thomas Hofmann,et al.  The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data , 1999, IJCAI.

[21]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[22]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[23]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[24]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[25]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[26]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[27]  Shivakumar Vaithyanathan,et al.  Model-Based Hierarchical Clustering , 2000, UAI.

[28]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[29]  Stuart J. Russell,et al.  Approximate inference for first-order probabilistic languages , 2001, IJCAI.

[30]  Mihaela Enachescu,et al.  Variations on Random Graph Models for the Web , 2001 .

[31]  S. Redner,et al.  Organization of growing random networks. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[33]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[34]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[35]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[38]  Marti A. Hearst,et al.  Nearly-Automated Metadata Hierarchy Creation , 2004, NAACL.

[39]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[41]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[42]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[43]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[44]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Stuart J. Russell,et al.  Approximate Inference for Infinite Contingent Bayesian Networks , 2005, AISTATS.

[46]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[47]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[48]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[50]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[51]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[52]  Kiyosi Itô,et al.  Essentials of Stochastic Processes , 2006 .

[53]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[54]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[55]  Max Welling,et al.  Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[56]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[57]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[58]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[59]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[60]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[61]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[62]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[63]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[64]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[65]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[66]  Andrew McCallum,et al.  Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.

[67]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[68]  David Poole,et al.  Logical Generative Models for Probabilistic Reasoning about Existence, Roles and Identity , 2007, AAAI.