Tree-Structured Stick Breaking for Hierarchical Data

Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data.

[1]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[2]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[3]  W. Sudderth,et al.  Polya Trees and Random Distributions , 1992 .

[4]  J. Pitman Random discrete distributions invariant under size-biased permutation , 1996, Advances in Applied Probability.

[5]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[6]  Christopher K. I. Williams A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[7]  S. MacEachern Decision Theoretic Aspects of Dependent Nonparametric Processes , 2000 .

[8]  Alan E. Gelfand,et al.  SPATIAL NONPARAMETRIC BAYESIAN MODELS , 2001 .

[9]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[10]  Jim Pitman,et al.  Poisson–Dirichlet and GEM Invariant Distributions for Split-and-Merge Transformations of an Interval Partition , 2002, Combinatorics, Probability and Computing.

[11]  Thomas L. Griffiths,et al.  Semi-Supervised Learning with Trees , 2003, NIPS.

[12]  Djc MacKay,et al.  Slice sampling - Discussion , 2003 .

[13]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[19]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[20]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[21]  Thomas Hofmann,et al.  Learning annotated hierarchies from relational data , 2007 .

[22]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[24]  Richard S. Zemel,et al.  Learning stick-figure models using nonparametric Bayesian priors over trees , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[28]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[29]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[30]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.