Flexible Martingale Priors for Deep Hierarchies

When building priors over trees for Bayesian hierarchical models, there is a tension between maintaining desirable theoretical properties such as infinite exchangeability and important practical properties such as the ability to increase the depth of the tree to accommodate new data. We resolve this tension by presenting a family of infinitely exchangeable priors over discrete tree structures that allows the depth of the tree to grow with the data, and then showing that our family contains all hierarchical models with certain mild symmetry properties. We also show that deep hierarchical models are in general intimately tied to a process called a martingale, and use Doob’s martingale convergence theorem to demonstrate some unexpected properties of deep hierarchies.

[1]  C. J-F,et al.  THE COALESCENT , 1980 .

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Zoubin Ghahramani,et al.  Pitman-Yor Diffusion Trees , 2011, UAI.

[4]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[5]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[6]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[7]  Josef Leydold,et al.  Short universal generators via generalized ratio-of-uniforms method , 2003, Math. Comput..

[8]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[9]  J. Pitman Coalescents with multiple collisions , 1999 .

[10]  Yee Whye Teh,et al.  Bayesian Rose Trees , 2010, UAI.

[11]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[12]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Charles W. Lamb A short proof of the martingale convergence theorem , 1973 .

[15]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[16]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[17]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[18]  Michael I. Jordan,et al.  Nonparametric bayesian models for machine learning , 2008 .