A Product Partition Model With Regression on Covariates

We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient’s covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction. We build on product partition models (PPM). We define an extension of the PPM to include a regression on covariates by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss implementations suitable for any combination of continuous, categorical, count, and ordinal covariates. An implementation of the proposed model as R-package is available for download.

[1]  D. Dunson,et al.  BAYESIAN GENERALIZED PRODUCT PARTITION MODEL , 2010 .

[2]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[3]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[4]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[5]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[6]  Helmuth Späth,et al.  Algorithm 39 Clusterwise linear regression , 1979, Computing.

[7]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[8]  Gary L Rosner,et al.  Bayesian Monitoring of Clinical Trials with Failure‐Time Endpoints , 2005, Biometrics.

[9]  F. Quintana A predictive view of Bayesian clustering , 2006 .

[10]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[11]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[12]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[13]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[14]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[15]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[16]  L. Shapley,et al.  Statistics, probability, and game theory : papers in honor of David Blackwell , 1999 .

[17]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[18]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[19]  Lancelot F. James,et al.  Generalized weighted Chinese restaurant processes for species sampling mixture models , 2003 .

[20]  E. M. Crowley Product Partition Models for Normal Means , 1997 .

[21]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[22]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[23]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[24]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[25]  S Bologna,et al.  On Clusterwise Linear Regression , 2005 .

[26]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[27]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[28]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[29]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .