A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB

BackgroundThere are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA).MethodsThe performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program’s ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known.ResultsThe results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups.ConclusionsOur subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.

[1]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[2]  Israel Spiegler,et al.  Investigating diversity of clustering methods: An empirical comparison , 2007, Data Knowl. Eng..

[3]  P. Kjaer,et al.  Intra- and interobserver reproducibility of vertebral endplate signal (Modic) changes in the lumbar spine: the nordic modic consensus group classification , 2007, Acta radiologica.

[4]  S. Bryan,et al.  Comparison of Stratified Primary Care Management for Low Back Pain with Current Best Practice (STarTBack): A Randomised Controlled Trial , 2013, physioscience.

[5]  S. George,et al.  Low Back Pain Subgroups Using Fear-Avoidance Model Measures: Results of a Cluster Analysis , 2012, The Clinical journal of pain.

[6]  J. Keating,et al.  Research methods for subgrouping low back pain , 2010, BMC medical research methodology.

[7]  N. Wedderkopp,et al.  Rest versus exercise as treatment for patients with low back pain and Modic changes. a randomized controlled clinical trial , 2012, BMC Medicine.

[8]  Jay Magidson,et al.  Latent class models for clustering : a comparison with K-means , 2002 .

[9]  Dominique Haughton,et al.  Identifying Groups: A Comparison of Methodologies , 2011, Journal of Data Science.

[10]  J. Fritz,et al.  A Clinical Prediction Rule for Classifying Patients with Low Back Pain Who Demonstrate Short-Term Improvement With Spinal Manipulation , 2002, Spine.

[11]  C. Leboeuf‐Yde,et al.  Modic changes and their associations with clinical findings , 2006, European Spine Journal.

[12]  Jos Twisk,et al.  Classifying developmental trajectories over time should be done with great caution: a comparison between methods. , 2012, Journal of clinical epidemiology.

[13]  M. Klebanoff Subgroup analysis in obstetrics clinical trials. , 2007, American journal of obstetrics and gynecology.

[14]  Richard D Riley,et al.  Prognosis research strategy (PROGRESS) 4: Stratified medicine research , 2013, BMJ : British Medical Journal.

[15]  P. Kent,et al.  Can pathoanatomical pathways of degeneration in lumbar motion segments be identified by clustering MRI findings , 2013, BMC Musculoskeletal Disorders.

[16]  Dominique Haughton,et al.  Review of Three Latent Class Cluster Analysis Packages: Latent Gold, poLCA, and MCLUST , 2009 .

[17]  F. Billari,et al.  Classifying life course trajectories: a comparison of latent class and sequence analysis , 2012 .

[18]  A. Hausheer,et al.  Comparison of Stratified Primary Care Management for Low Back Pain with Current Best Practice (STarTBack): A Randomised Controlled Trial , 2013, physioscience.

[19]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[20]  Alan D. Lopez,et al.  A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[21]  P. Kent,et al.  The prevalence of MRI-defined spinal pathoanatomies and their association with Modic changes in individuals seeking care for low back pain , 2011, European Spine Journal.

[22]  C. Leboeuf‐Yde,et al.  Feasibility of the STarT back screening tool in chiropractic clinics: a cross-sectional study of patients with low back pain , 2011, Chiropractic & manual therapies.

[23]  J. Niinimäki,et al.  Association of Modic Changes, Schmorl's Nodes, Spondylolytic Defects, High-Intensity Zone Lesions, Disc Herniations, and Radial Tears With Low Back Symptom Severity Among Young Finnish Adults , 2012, Spine.

[24]  Peter Kent,et al.  Identifying clinical course patterns in SMS data using cluster analysis , 2012, Chiropractic & Manual Therapies.

[25]  Lennart Bodin,et al.  Clustering patients on the basis of their individual course of low back pain over a six month period , 2011, BMC musculoskeletal disorders.

[26]  A. Kongsted,et al.  Patient characteristics in low back pain subgroups based on an existing classification system. A descriptive cohort study in chiropractic practice. , 2014, Manual therapy.

[27]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[28]  J. Vermunt,et al.  Latent Gold 4.0 User's Guide , 2005 .

[29]  C. Hansen,et al.  Inexperienced clinicians can extract pathoanatomic information from MRI narrative reports with high reproducibility for use in research/quality assurance , 2011, Chiropractic & Manual Therapies.

[30]  Knut Wenzig,et al.  SPSS TwoStep Cluster - a first evaluation , 2004 .