Pitman-Yor Diffusion Trees

We introduce the Pitman Yor Diffusion Tree (PYDT) for hierarchical clustering, a generalization of the Dirichlet Diffusion Tree (Neal, 2001) which removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model and then present two inference methods: a collapsed MCMC sampler which allows us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary.

[1]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[2]  Ryan P. Adams,et al.  The Gaussian Process Density Sampler , 2008, NIPS.

[3]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[4]  Thomas Hofmann,et al.  Learning annotated hierarchies from relational data , 2007 .

[5]  Zoubin Ghahramani,et al.  Message passing algorithms for dirichlet diffusion trees , 2011, ICML 2011.

[6]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[7]  Yee Whye Teh,et al.  Bayesian Rose Trees , 2010, UAI.

[8]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[9]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[10]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[11]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[12]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[13]  P. McCullagh,et al.  Gibbs fragmentation trees , 2007, 0704.0945.

[14]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[15]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[16]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[17]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[18]  D. Aldous Exchangeability and related topics , 1985 .

[19]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[20]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[21]  Christopher K. I. Williams A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[22]  David G. Stork,et al.  Pattern Classification , 1973 .

[23]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.