Random effects clustering in multilevel modeling: choosing a proper partition

A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.

[1]  Roberta Siciliano,et al.  A fast splitting procedure for classification trees , 1997, Stat. Comput..

[2]  Peter Müller,et al.  DPpackage: Bayesian Semi- and Nonparametric Modeling in R , 2011 .

[3]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[4]  C. Roberts,et al.  International caesarean section rates: the rising tide. , 2015, The Lancet. Global health.

[5]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[6]  G. Moon,et al.  Context, composition and heterogeneity: using multilevel models in health research. , 1998, Social science & medicine.

[7]  Francesca Ieva,et al.  Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in‐hospital survival , 2014 .

[8]  C. Dagum A new approach to the decomposition of the Gini income inequality ratio , 1997 .

[9]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[10]  Mariana C. Arcaya,et al.  Hospital Differences in Cesarean Deliveries in Massachusetts (US) 2004–2006: The Case against Case-Mix Artifact , 2013, PloS one.

[11]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[12]  Nicola Torelli,et al.  Relabelling in Bayesian mixture models by pivotal units , 2015, Stat. Comput..

[13]  K. Kozhimannil,et al.  Cesarean delivery rates vary tenfold among US hospitals; reducing variation may address quality and cost issues. , 2013, Health affairs.

[14]  Nial Friel,et al.  Optimal Bayesian estimators for latent variable cluster models , 2016, Statistics and Computing.

[15]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[16]  Zoubin Ghahramani,et al.  Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion) , 2015, Bayesian Analysis.

[17]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[18]  J. Simpson,et al.  Unexplained variation in hospital caesarean section rates , 2013, The Medical journal of Australia.

[19]  Gerhard Tutz,et al.  Modelling Clustered Heterogeneity: Fixed Effects, Random Effects and Mixtures , 2017 .

[20]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[21]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[22]  David A Cromwell,et al.  Variation in rates of caesarean section among English NHS trusts after accounting for maternal and clinical risk: cross sectional study , 2010, BMJ : British Medical Journal.

[23]  Gerhard Tutz,et al.  Clustering in linear‐mixed models with a group fused lasso penalty , 2014, Biometrical journal. Biometrische Zeitschrift.

[24]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[25]  Helga Wagner,et al.  Bayesian Effect Fusion for Categorical Predictors , 2017, Bayesian Analysis.

[26]  J G Ibrahim,et al.  A semi-parametric Bayesian approach to generalized linear mixed models. , 1998, Statistics in medicine.

[27]  Massimo Cannas,et al.  Variation in caesarean delivery rates across hospitals: a Bayesian semi-parametric approach , 2017 .

[28]  Carla Rampichini,et al.  Clustering Upper Level Units in Multilevel Models for Ordinal Data , 2018 .

[29]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[30]  Gerhard Tutz,et al.  Tree-Structured Clustering in Fixed Effects Models , 2015, 1512.05169.

[31]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .