Pitfalls in the use of Parallel Inference for the Dirichlet Process

Recent work done by Lovell, Adams, and Mansingka (2012) and Williamson, Dubey, and Xing (2013) has suggested an alternative parametrisation for the Dirichlet process in order to derive non-approximate parallel MCMC inference for it - work which has been picked-up and implemented in several different fields. In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data. We characterise the requirements of efficient parallel inference for the Dirichlet process and show that the proposed inference fails most of these requirements (while approximate approaches often satisfy most of them). We present both theoretical and experimental evidence, analysing the load balance for the inference and showing that it is independent of the size of the dataset and the number of nodes available in the parallel implementation. We end with suggestions of alternative paths of research for efficient non-approximate parallel inference for the Dirichlet process.

[1]  Phil Blunsom,et al.  A Systematic Bayesian Treatment of the IBM Alignment Models , 2013, HLT-NAACL.

[2]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[3]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[4]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[5]  Eric P. Xing,et al.  Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models , 2013, ICML.

[6]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[7]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[8]  John W. Fisher,et al.  Parallel Sampling of DP Mixture Models using Sub-Cluster Splits , 2013, NIPS.

[9]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[10]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[11]  W. R. Schucany,et al.  Handbook of Parallel Computing and Statistics , 2008 .

[12]  Paul Fearnhead,et al.  Particle filters for mixture models with an unknown number of components , 2004, Stat. Comput..

[13]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[14]  Alex A. Birklykke,et al.  Markov chain algorithms: a template for building future robust low-power systems , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  Ali Taylan Cemgil,et al.  Sequential Monte Carlo Samplers for Dirichlet Process Mixtures , 2010, AISTATS.

[18]  Hermann Ney,et al.  Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation , 2008, COLING.

[19]  Takashi Matsumoto,et al.  Domain-dependent/independent topic switching model for online reviews with numerical ratings , 2013, CIKM.

[20]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[21]  J. Kingman The Representation of Partition Structures , 1978 .

[22]  Gerald L Detter RICH GET RICHER , 1999 .

[23]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Ryan P. Adams,et al.  ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures , 2013, ArXiv.

[26]  Zoubin Ghahramani,et al.  Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process , 2009, NIPS.