Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process?

Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Here we focus on the family of Gibbs–type priors, a recent elegant generalization of the Dirichlet and the Pitman–Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive predictive characterization justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman–Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs–type priors and highlight their implications for Bayesian nonparametric inference. We deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time–dependent versions. Applications, mainly concerning mixture models and species sampling, serve to convey the main ideas. The intuition inherent to this class of priors and the neat results they lead to make one wonder whether it actually represents the most natural generalization of the Dirichlet process.

[1]  S. Walker,et al.  Bayesian nonparametric estimators derived from conditional Gibbs structures , 2008, 0808.2863.

[2]  S. Walker,et al.  On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation , 2005 .

[3]  Fernando A. Quintana,et al.  On the Support of MacEachern’s Dependent Dirichlet Processes and Extensions , 2012 .

[4]  Charalambos A. Charalambides,et al.  Combinatorial Methods in Discrete Distributions (Wiley Series in Probability and Statistics) , 2005 .

[5]  Andrew J. Roger,et al.  Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys , 2004, Bioinform..

[6]  Michael I. Jordan,et al.  Hierarchical Bayesian Nonparametric Models with Applications , 2008 .

[7]  S. Ghosal,et al.  2 The Dirichlet process , related priors and posterior asymptotics , 2009 .

[8]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[9]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[10]  Peter Müller,et al.  A Bayesian semiparametric approach for the differential analysis of sequence counts data , 2014, Journal of the Royal Statistical Society. Series C, Applied statistics.

[11]  Arnaud Doucet,et al.  Generalized Polya Urn for Time-varying Dirichlet Process Mixtures , 2007, UAI.

[12]  J. Pitman,et al.  Exchangeable Gibbs partitions and Stirling triangles , 2004, math/0412494.

[13]  S. Ghosal Bayesian Nonparametrics: The Dirichlet process, related priors and posterior asymptotics , 2010 .

[14]  Antonio Lijoi,et al.  Bayesian Nonparametric Analysis for a Generalized Dirichlet Process Prior , 2005 .

[15]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[16]  P. Damlen,et al.  Gibbs sampling for Bayesian non‐conjugate and hierarchical models by using auxiliary variables , 1999 .

[17]  S. Ethier,et al.  The infinitely-many-neutral-alleles diffusion model , 1981, Advances in Applied Probability.

[18]  David A. Freedman,et al.  Invariants Under Mixing which Generalize de Finetti's Theorem , 1962 .

[19]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[20]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[21]  Fernando A. Quintana,et al.  Some issues in nonparametric Bayesian modeling using species sampling models , 2008 .

[22]  Matteo Ruggiero,et al.  Species Dynamics in the Two-Parameter Poisson-Dirichlet Diffusion Model , 2013, Journal of Applied Probability.

[23]  Michael I. Jordan,et al.  Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications , 2010 .

[24]  Maria De Iorio,et al.  Bayesian semiparametric inference for multivariate doubly-interval-censored data , 2010, 1101.1415.

[25]  Lancelot F. James,et al.  Posterior Analysis for Normalized Random Measures with Independent Increments , 2009 .

[26]  Luc Devroye,et al.  Random variate generation for exponentially and polynomially tilted stable distributions , 2009, TOMC.

[27]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[28]  T. Rolski On random discrete distributions , 1980 .

[29]  J. Ghosh,et al.  POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .

[30]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[31]  Ramsés H. Mena,et al.  Controlling the reinforcement in Bayesian non‐parametric mixture models , 2007 .

[32]  A. Lijoi,et al.  Conditional formulae for Gibbs-type exchangeable random partitions , 2013, 1309.1335.

[33]  P. McCullagh Partition models , 2015 .

[34]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[35]  Bruce G. Lindsay,et al.  A Poisson model for the coverage problem with a genomic application , 2002 .

[36]  Antonio Lijoi,et al.  Bayesian Nonparametrics: Models beyond the Dirichlet process , 2010 .

[37]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[38]  Leonid Petrov,et al.  Two-parameter family of diffusion processes in the Kingman simplex , 2007 .

[39]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[40]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[41]  Gun Ho Jang,et al.  POSTERIOR CONSISTENCY OF SPECIES SAMPLING PRIORS , 2010 .

[42]  Stephen G. Walker,et al.  COUNTABLE REPRESENTATION FOR INFINITE DIMENSIONAL DIFFUSIONS DERIVED FROM THE TWO-PARAMETER POISSON-DIRICHLET PROCESS , 2009 .

[43]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[44]  Ramsés H. Mena,et al.  Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors , 2005 .

[45]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[46]  A. Lijoi,et al.  Distributional results for means of normalized random measures with independent increments , 2003 .

[47]  Eugenio Regazzini,et al.  EXCHANGEABILITY, PREDICTIVE DISTRIBUTIONS AND PARAMETRIC MODELS* , 2000 .

[48]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[49]  Stephen G. Walker,et al.  Alpha-diversity processes and normalized inverse-Gaussian diffusions. , 2013, 1302.3000.

[50]  C. Mao Predicting the Conditional Probability of Discovering a New Class , 2004 .

[51]  S. Walker,et al.  Investigating nonparametric priors with Gibbs structure , 2008 .

[52]  Charalambos A. Charalambides,et al.  Combinatorial Methods in Discrete Distributions: Charalambides/Combinatorial , 2005 .

[53]  Stephen G. Walker,et al.  On a Class of Random Probability Measures with General Predictive Structure , 2011 .

[54]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[55]  R. M. Korwar,et al.  Contributions to the Theory of Dirichlet Processes , 1973 .

[56]  A. Lijoi,et al.  Asymptotics for a Bayesian nonparametric estimator of species variety , 2012, 1211.5422.

[57]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[58]  Albert Y. Lo,et al.  A characterization of the Dirichlet process , 1991 .

[59]  Donato Michele Cifarelli,et al.  Nonparametric statistical problems under partial exchangeability . The role of associative means . Translated from Problemi statistici non parametrici in condizioni di scambiabilità parziale : impiego di medie associative , 2008 .

[60]  J. Pitman Poisson-Kingman partitions , 2002, math/0210396.

[61]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[62]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[63]  Stefano Favaro,et al.  A new estimator of the discovery probability. , 2012, Biometrics.

[64]  Ramsés H. Mena,et al.  Bayesian non‐parametric inference for species variety with a two‐parameter Poisson–Dirichlet process prior , 2009 .

[65]  Ramsés H. Mena,et al.  Bayesian Nonparametric Estimation of the Probability of Discovering New Species , 2007 .

[66]  S. Zabell W. E. Johnson's "Sufficientness" Postulate , 1982 .

[67]  A. Brix Generalized Gamma measures and shot-noise Cox processes , 1999, Advances in Applied Probability.

[68]  A. Gnedin A Species Sampling Model with Finitely Many Types , 2009, 0910.1988.

[69]  A. Lijoi,et al.  Models Beyond the Dirichlet Process , 2009 .

[70]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[71]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[72]  A. Lijoi,et al.  AN ASYMPTOTIC ANALYSIS OF A CLASS OF DISCRETE NONPARAMETRIC PRIORS , 2013 .

[73]  K. Doksum Tailfree and Neutral Random Probabilities and Their Posterior Distributions , 1974 .

[74]  L. Petrov,et al.  Two-parameter family of infinite-dimensional diffusions on the Kingman simplex , 2007, 0708.1930.

[75]  Antonio Lijoi,et al.  A Bayesian nonparametric method for prediction in EST analysis , 2007, BMC Bioinformatics.

[76]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[77]  Charalambos A. Charalambides,et al.  Combinatorial Methods in Discrete Distributions , 2005 .

[78]  Igor Prunster,et al.  A Bayesian nonparametric approach to modeling market share dynamics , 2013, 1302.0115.

[79]  Lancelot F. James Large sample asymptotics for the two-parameter Poisson–Dirichlet process , 2007, 0708.4294.