The Nested Dirichlet Process

In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stick-breaking representation of the Dirichlet process (DP), we replace the random atoms with random probability measures drawn from a DP. This results in a nested DP prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient Markov chain Monte Carlo algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in U.S. hospitals.

[1]  J. McCloskey,et al.  A model for the distribution of individuals by species in an environment , 1965 .

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[5]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[6]  D. Binder Bayesian cluster analysis , 1978 .

[7]  D. Clayton A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence , 1978 .

[8]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[9]  David A. Binder,et al.  Approximations to Bayesian clustering rules , 1981 .

[10]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[11]  R. Prentice,et al.  Correlated binary regression with covariates specific to each binary observation. , 1988, Biometrics.

[12]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[13]  L. J. Wei,et al.  Regression analysis of multivariate incomplete failure time data by modeling marginal distributions , 1989 .

[14]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[15]  L. Zhao,et al.  Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. , 1991, Biometrics.

[16]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[17]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[18]  Lee-Jen Wei,et al.  Cox-Type Regression Analysis for Large Numbers of Small Groups of Correlated Failure Time Observations , 1992 .

[19]  Pietro Muliere,et al.  A bayesian predictive approach to sequential search for an optimal dose: Parametric and nonparametric models , 1993 .

[20]  Christopher A. Bush Semi-parametric Bayesian linear models / , 1994 .

[21]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[22]  Danny Kopec,et al.  Additional References , 1999 .

[23]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[24]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[25]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[26]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[27]  G. Verbeke,et al.  A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population , 1996 .

[28]  M. Lindstrom,et al.  A survey of methods for analyzing clustered binary response data , 1996 .

[29]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[30]  Antonietta Mira,et al.  Bayesian hierarchical nonparametric inference for change-point problems , 1996 .

[31]  A. Gelfand,et al.  Dirichlet Process Mixed Generalized Linear Models , 1997 .

[32]  Lynn Kuo,et al.  Bayesian semiparametric inference for the accelerated failure‐time model , 1997 .

[33]  P Gustafson,et al.  Large hierarchical Bayesian analysis of multivariate survival data. , 1997, Biometrics.

[34]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[35]  G. Tomlinson Analysis of densities , 1998 .

[36]  J G Ibrahim,et al.  A semi-parametric Bayesian approach to generalized linear mixed models. , 1998, Statistics in medicine.

[37]  J. Raz,et al.  Semiparametric Stochastic Mixed Models for Longitudinal Data , 1998 .

[38]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[39]  S. MacEachern Decision Theoretic Aspects of Dependent Nonparametric Processes , 2000 .

[40]  Scott L. Zeger,et al.  Marginalized Multilevel Models and Likelihood Inference , 2000 .

[41]  A. Gelfand,et al.  Bayesian Semiparametric Median Regression Modeling , 2001 .

[42]  Lancelot F. James,et al.  Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions , 2001 .

[43]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[44]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[45]  Keisuke Hirano,et al.  Semiparametric Bayesian Inference in Autoregressive Panel Data Models , 2002 .

[46]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[47]  Siddhartha Chib,et al.  Semiparametric Bayes analysis of longitudinal data treatment models , 2002 .

[48]  Lancelot F. James,et al.  Approximate Dirichlet Process Computing in Finite Normal Mixtures , 2002 .

[49]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[50]  H. Ishwaran,et al.  DIRICHLET PRIOR SIEVES IN FINITE NORMAL MIXTURES , 2002 .

[51]  W. Johnson,et al.  Modeling Regression Error With a Mixture of Polya Trees , 2002 .

[52]  Anthony O'Hagan,et al.  A hierarchical Bayes model for multilocation auditing , 2002 .

[53]  Alan E Gelfand,et al.  A Nonparametric Bayesian Modeling Approach for Cytogenetic Dosimetry , 2002, Biometrics.

[54]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[55]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[56]  Stephen G. Walker,et al.  A New Class of Bayesian Semiparametric Models with Applications to Option Pricing , 2011 .

[57]  D. B. Dahl An improved merge-split sampler for conjugate dirichlet process mixture models , 2003 .

[58]  Lancelot F. James,et al.  Some further developments for stick-breaking priors: Finite and infinite clustering and classification , 2003 .

[59]  S. Sheather Density Estimation , 2004 .

[60]  S. MacEachern,et al.  An ANOVA Model for Dependent Random Measures , 2004 .

[61]  Andrea Ongaro,et al.  Discrete random probability measures: a general framework for nonparametric Bayesian inference☆ , 2004 .

[62]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[63]  Hemant Ishwaran,et al.  Computational Methods for Multiplicative Intensity Models Using Weighted Gamma Processes , 2004 .

[64]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[65]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[66]  David B. Dunson,et al.  Semiparametric classification in hierarchical functional data analysis , 2005 .

[67]  S. MacEachern,et al.  Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing , 2005 .

[68]  D. Dunson Bayesian Semiparametric Isotonic Regression for Count Data , 2005 .

[69]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[70]  P. Müller,et al.  Bayesian inference for gene expression and proteomics , 2006 .

[71]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[72]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[73]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[74]  D. Dunson Bayesian dynamic modeling of latent trait distributions. , 2006, Biostatistics.

[75]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[76]  N. Pillai,et al.  Bayesian density regression , 2007 .

[77]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[78]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[79]  Abel Rodriguez Some advances in Bayesian nonparametric modeling , 2007 .

[80]  Michele Guindani,et al.  Bayesian nonparametric modelling for spatial data using Dirichlet processes , 2007 .

[81]  Kaushik Ghosh,et al.  Prediction of U.S. Cancer Mortality Counts Using Semiparametric Bayesian Techniques , 2007 .

[82]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[83]  Stephen G. Walker,et al.  Gibbs and autoregressive Markov processes , 2007 .

[84]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[85]  D. Dunson,et al.  Bayesian Selection and Clustering of Polymorphisms in Functionally Related Genes , 2008 .

[86]  L. Carin,et al.  The Matrix Stick-Breaking Process , 2008 .

[87]  D. Dunson,et al.  Kernel stick-breaking processes. , 2008, Biometrika.

[88]  David B. Dunson,et al.  Multi-Task Learning for Analyzing and Sorting Large Databases of Sequential Data , 2008, IEEE Transactions on Signal Processing.

[89]  P. Müller,et al.  A Bayesian discovery procedure , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[90]  Wesley O Johnson,et al.  Bayesian Nonparametric Nonproportional Hazards Survival Modeling , 2009, Biometrics.

[91]  Samuel Kaski,et al.  Infinite factorization of multiple non-parametric views , 2010, Machine Learning.

[92]  A. Gelfand,et al.  Bayesian Nonparametric Functional Data Analysis Through Density Estimation. , 2009, Biometrika.

[93]  Jorge Mateu,et al.  Statistics for spatial functional data: some recent contributions , 2009 .

[94]  Heng Lian,et al.  Sparse Bayesian hierarchical modeling of high-dimensional clustering problems , 2009, J. Multivar. Anal..

[95]  Hyungwon Choi,et al.  Analysis of protein complexes through model-based biclustering of label-free quantitative AP-MS data , 2010, Molecular systems biology.

[96]  Scott Lindroth,et al.  Dynamic Nonparametric Bayesian Models for Analysis of Music , 2010 .

[97]  Subharup Guha Posterior Simulation in Countable Mixture Models for Large Datasets , 2010 .