Sample selection in the face of design constraints: Use of clustering to define sample strata for qualitative research

OBJECTIVE To sample 40 physician organizations stratified on the basis of longitudinal cost of care measures for qualitative interviews in order to describe the range of care delivery structures and processes that are being deployed to influence the total costs of caring for patients. DATA SOURCES Three years of physician organization-level total cost of care data (n = 156 in California) from the Integrated Healthcare Association's value-based pay-for-performance program. STUDY DESIGN We fit total cost of care data using mixture and K-means clustering algorithms to segment the population of physician organizations into sampling strata based on 3-year cost trajectories (ie, cost curves). PRINCIPAL FINDINGS A mixture of multivariate normal distributions can classify physician organization cost curves into clusters defined by total cost level, shape, and within-cluster variation. K-means clustering does not accommodate differing levels of within-cluster variation and resulted in more clusters being allocated to unstable cost curves. A mixture of regressions approach focuses overly on anomalous trajectories and is sensitive to model coding. CONCLUSIONS Statistical clustering can be used to form sampling strata when longitudinal measures are of primary interest. Many clustering algorithms are available; the choice of the clustering algorithm can strongly impact the resulting strata because various algorithms focus on different aspects of the observed data.

[1]  Brian S. Caffo,et al.  Multilevel functional principal component analysis , 2009 .

[2]  Barack Obama,et al.  United States Health Care Reform Progress to Date and Next Steps , 2016 .

[3]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[4]  G. Cuckler,et al.  National health expenditure projections, 2014-24: spending growth faster than recent trends. , 2015, Health affairs.

[5]  Hoangmai H Pham,et al.  Association of Pioneer Accountable Care Organizations vs traditional Medicare fee for service with spending, utilization, and patient experience. , 2015, JAMA.

[6]  Jiguo Cao,et al.  Estimating the intensity of ward admission and its effect on emergency department access block. , 2013, Statistics in medicine.

[7]  Peter Müller,et al.  A Nonparametric Bayesian Model for Local Clustering With Application to Proteomics , 2013, Journal of the American Statistical Association.

[8]  E. Ray Dorsey,et al.  The anatomy of health care in the United States. , 2013, JAMA.

[9]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[10]  Lesli S. Ott,et al.  "Phenotyping" hospital value of care for patients with heart failure. , 2014, Health services research.

[11]  B. Landon,et al.  Performance differences in year 1 of pioneer accountable care organizations. , 2015, The New England journal of medicine.

[12]  David Cutler,et al.  Analysis & commentary. How health care reform must bend the cost curve. , 2010, Health affairs.

[13]  C. Jarque,et al.  A Solution to the Problem of Optimum Stratification in Multivariate Sampling , 1981 .

[14]  K. Roeder,et al.  A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories , 2001 .

[15]  J. Escarce,et al.  Sampling patients within physician practices and health plans: multistage cluster samples in health services research. , 2003, Health Services Research.

[16]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[17]  Sherri Rose,et al.  Changes in health care spending and quality 4 years into global payment. , 2014, The New England journal of medicine.

[18]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .