Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression

Background: Audience segmentation strategies are of increasing interest to public health professionals who wish to identify easily defined, mutually exclusive population subgroups whose members share similar characteristics that help determine participation in a health-related behavior as a basis for targeted interventions. Classification and regression tree (C&RT) analysis is a nonparametric decision tree methodology that has the ability to efficiently segment populations into meaningful subgroups. However, it is not commonly used in public health.Purpose: This study provides a methodological overview of C&RT analysis for persons unfamiliar with the procedure.Methods and Results: An example of a C&RT analysis is provided and interpretation of results is discussed. Results are validated with those obtained from a logistic regression model that was created to replicate the C&RT findings. Results obtained from the example C&RT analysis are also compared to those obtained from a common approach to logistic regression, the stepwise selection procedure. Issues to consider when deciding whether to use C&RT are discussed, and situations in which C&RT may and may not be beneficial are described.Conclusions: C&RT is a promising research tool for the identification of at-risk populations in public health research and outreach.

[1]  R. Olshen,et al.  Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[2]  A. Ciampi,et al.  Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covaria , 1986 .

[3]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[4]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[5]  M R Segal,et al.  A comparison of estimated proportional hazards models and regression trees. , 1989, Statistics in medicine.

[6]  C. Viscoli,et al.  Patient acceptance of influenza vaccination. , 1991, The American journal of medicine.

[7]  A. Marmarou,et al.  Prediction tree for severely head-injured patients. , 1991, Journal of neurosurgery.

[8]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  M. Segal Tree-Structured Methods for Longitudinal Data , 1992 .

[10]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[11]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[12]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[13]  K M McConnochie,et al.  Developing Prediction Rules and Evaluating Observation Patterns Using Categorical Clinical Markers , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  D. Nelson,et al.  Recursive partitioning analysis of prognostic factors in three Radiation Therapy Oncology Group malignant glioma trials. , 1993, Journal of the National Cancer Institute.

[15]  Acp Guide for Adult Immunization , 1994 .

[16]  E. Roth,et al.  Predicting stroke inpatient rehabilitation outcome using a classification tree approach. , 1994, Archives of physical medicine and rehabilitation.

[17]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[18]  Race-specific differences in influenza vaccination levels among Medicare beneficiaries--United States, 1993. , 1995, MMWR. Morbidity and mortality weekly report.

[19]  Z Lou,et al.  Tree-structured prediction for censored survival data and the Cox model. , 1995, Journal of clinical epidemiology.

[20]  Choosing Audience: Segmentation Strategies and Methods for Health Communication , 1995 .

[21]  R. D'Agostino,et al.  A comparison of performance of mathematical predictive methods for medical diagnosis: identifying acute cardiac ischemia among emergency department patients. , 1995, Journal of investigative medicine : the official publication of the American Federation for Clinical Research.

[22]  The correlates of health perceptions in rheumatoid arthritis. , 1995, The Journal of rheumatology.

[23]  N R Temkin,et al.  Classification and regression trees (CART) for prediction of function at 1 year following head trauma. , 1995, Journal of neurosurgery.

[24]  M R Segal,et al.  Extending the elements of tree-structured regression , 1995, Statistical methods in medical research.

[25]  N F Kassell,et al.  CART for prediction of function after head trauma. , 1995, Journal of neurosurgery.

[26]  W. Z. Liu,et al.  A comparison of nearest neighbour and tree-based methods of non-parametric discriminant analysis , 1995 .

[27]  T Holford,et al.  A tree-based method of analysis for prospective studies. , 1996, Statistics in medicine.

[28]  A mathematical model that improves the validity of osteoarthritis diagnoses obtained from a computerized diagnostic database. , 1996, Journal of clinical epidemiology.

[29]  Jonathan J. Oliver,et al.  Averaging over decision trees , 1996 .

[30]  D. Jewell,et al.  Predicting outcome in severe ulcerative colitis. , 1996, Gut.

[31]  David P Miller,et al.  Determinants of the use of coronary angiography and revascularization after thrombolysis for acute myocardial infarction. , 1996, The New England journal of medicine.

[32]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[33]  E F Cook,et al.  Prediction of the need for intensive care in patients who come to emergency departments with acute chest pain. , 1996, The New England journal of medicine.

[34]  R. Hamman,et al.  Population screening for glucose intolerant subjects using decision tree analyses. , 1996, Diabetes research and clinical practice.

[35]  D. Carmelli,et al.  Obesity and 33‐Year Follow‐up for Coronary Heart Disease and Cancer Mortality , 1997, Epidemiology.

[36]  Heping Zhang Classification Trees for Multiple Binary Responses , 1998 .

[37]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[38]  D A Bloch,et al.  Recursive partitioning for the identification of disease risk subgroups: a case-control study of subarachnoid hemorrhage. , 1998, Journal of clinical epidemiology.

[39]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[40]  Robert Tibshirani,et al.  Monotone Shrinkage of Trees , 1998 .

[41]  S. Fienberg,et al.  Calibration and refinement for classification trees , 1998 .

[42]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[43]  Heping Zhang Bayesian CART Model Search: Comment , 1998 .

[44]  Peggo K. W. Lam,et al.  Derivation of a prediction rule for post-traumatic acute lung injury. , 1999, Resuscitation.

[45]  W. Shannon,et al.  Combining classification trees using MLE. , 1999, Statistics in medicine.

[46]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[47]  K R Hess,et al.  Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma. , 1999, Clinical cancer research : an official journal of the American Association for Cancer Research.

[48]  Georgios Paliouras,et al.  A Comparison of Logistic Regression to Decision Tree Induction in the Diagnosis of Carpal Tunnel Syndrome , 1999, Comput. Biomed. Res..

[49]  M. Costanza,et al.  Statistical approaches in the development of clinical practice guidelines from expert panels: the case of laminectomy in sciatica patients. , 1999, Medical care.

[50]  B. Doebbeling,et al.  Influenza and pneumococcal vaccine receipt in older persons with chronic disease: a population-based study. , 1999, Medical care.

[51]  C. Bryant,et al.  Using Audience-Segmentation Techniques to Tailor Health Behavior Change Strategies , 2000 .

[52]  Oleg O. Bilukha,et al.  Prevention and control of meningococcal disease. Recommendations of the Advisory Committee on Immunization Practices (ACIP). , 2000, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[53]  Malbea A Lapete,et al.  Morbidity and Mortality Weekly Report Prevention and Control of Influenza Recommendations of the Advisory Committee on Immunization Practices (acip) Centers for Disease Control and Prevention Epidemiology Program Office Early Release 1 Prevention and Control of Influenza Recommendations of the Advis , 2022 .

[54]  R. Marshall The use of classification and regression trees in clinical epidemiology. , 2001, Journal of clinical epidemiology.

[55]  Pawan Sikka,et al.  Outcome of Older Patients with Severe Pneumonia Predicted by Recursive Partitioning , 2001, Journal of the American Geriatrics Society.

[56]  D. Felson,et al.  Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: the Framingham Study. , 2001, Arthritis and rheumatism.

[57]  K. Gregory,et al.  Variation in elective primary cesarean delivery by patient and hospital factors. , 2001, American journal of obstetrics and gynecology.

[58]  A M Zaslavsky,et al.  Racial disparity in influenza vaccination: does managed care narrow the gap between African Americans and whites? , 2001, JAMA.

[59]  M. Harper,et al.  Predictive model for serious bacterial infections among infants younger than 3 months of age. , 2001, Pediatrics.

[60]  C. Roehrborn,et al.  Clinical predictors of spontaneous acute urinary retention in men with LUTS and clinical BPH: a comprehensive analysis of the pooled placebo groups of several large clinical trials. , 2001, Urology.

[61]  J. Gregor,et al.  Screening for colorectal cancer: the cost to find an advanced adenoma. , 2002 .

[62]  J. Halamka,et al.  Emergency department triage of patients infected with HIV. , 2002, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[63]  Kenneth J. Smith,et al.  Cost-effectiveness of newer treatment strategies for influenza. , 2002, The American journal of medicine.

[64]  M. G. Marin,et al.  Influenza vaccination among minority populations in the United States. , 2002, Preventive medicine.

[65]  G. Fillenbaum,et al.  Assessing risk factors for mortality in elderly White and African American people: implications of alternative analyses. , 2002, The Gerontologist.

[66]  Kevin Fiscella,et al.  Disparities in Health Care by Race, Ethnicity, and Language Among the Insured: Findings From a National Sample , 2002, Medical care.

[67]  D. Graham,et al.  Prevention of complicated ulcer disease among chronic users of nonsteroidal anti-inflammatory drugs: the use of a nomogram in cost-effectiveness analysis. , 2002, Archives of internal medicine.

[68]  William Grobman,et al.  Elective induction: an analysis of economic and health consequences. , 2002, American journal of obstetrics and gynecology.

[69]  K. Eagle,et al.  Influence of age on outcomes in patients undergoing mitral valve replacement. , 2002, The Annals of thoracic surgery.

[70]  Bagging Tree Classifiers for Laser Scanning Images: Data and Simulation Based Strategy , 2002, Artif. Intell. Medicine.

[71]  N. Camp,et al.  Classification tree analysis: a statistical tool to investigate risk factor interactions with an example for colon cancer (United States) , 2002, Cancer Causes & Control.

[72]  Vili Podgorelec,et al.  Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[73]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.