An Introduction to Classification and Regression Tree (CART) Analysis

Introduction A common goal of many clinical research studies is the development of a reliable clinical decision rule, which can be used to classify new patients into clinically-important categories. Examples of such clinical decision rules include triage rules, whether used in the out-of-hospital setting or in the emergency department, and rules used to classify patients into various risk categories so that appropriate decisions can be made regarding treatment or hospitalization. Traditional statistical methods are cumbersome to use, or of limited utility, in addressing these types of classification problems. There are a number of reasons for these difficulties. First, there are generally many possible " predictor " variables which makes the task of variable selection difficult. Traditional statistical methods are poorly suited for this sort of multiple comparison. Second, the predictor variables are rarely nicely distributed. Many clinical variables are not normally distributed and different groups of patients may have markedly different degrees of variation or variance. Third, complex interactions or patterns may exist in the data. For example, the value of one variable (e.g., age) may substantially affect the importance of another variable (e.g., weight). These types of interactions are generally difficult to model, and virtually impossible to model when the number of interactions and variables becomes substantial. Fourth, the results of traditional methods may be difficult to use. For example, a multivariate logistic regression model yields a probability of disease, which can be calculated using the regression coefficients and the characteristics of the patient, yet such models are rarely utilized in clinical practice. Clinicians generally do not think in terms of probability but, rather in terms of categories, such as " low risk " versus " high risk. " Regardless of the statistical methodology being used, the creation of a clinical decision rule requires a relatively large dataset. For each patient in the dataset, one variable (the dependent variable), records whether or not that patient had the condition which we hope to predic t accurately in future patients. Examples might include significant injury after trauma, myocardial infarction, or subarachnoid hemorrhage in the setting of headache. In addition, other variables record the values of patient characteristics which we believe might help us to predict the value of the dependent variable. For example, if one hopes to predict the presence of subarachnoid hemorrhage, a possible predictor variable might be whether or not the patient's headache was sudden in onset; another possible …

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  J Hasford,et al.  CART and logistic regression analyses of risk factors for first dose hypotension by an ACE-inhibitor. , 1993, Therapie.

[3]  E. Roth,et al.  Predicting stroke inpatient rehabilitation outcome using a classification tree approach. , 1994, Archives of physical medicine and rehabilitation.

[4]  S. R. Wilson,et al.  Prospective study of hospitalization for asthma. A preliminary risk factor model. , 1995, American journal of respiratory and critical care medicine.

[5]  R. D'Agostino,et al.  A comparison of performance of mathematical predictive methods for medical diagnosis: identifying acute cardiac ischemia among emergency department patients. , 1995, Journal of investigative medicine : the official publication of the American Federation for Clinical Research.

[6]  J. Mair,et al.  A decision tree for the early diagnosis of acute myocardial infarction in nontraumatic chest pain patients at hospital admission. , 1995, Chest.

[7]  Concept formation vs. logistic regression: predicting death in trauma patients , 1996, Artificial Intelligence in Medicine.

[8]  M. Hadamitzky,et al.  Predictive factors of restenosis after coronary stent placement. , 1997, Journal of the American College of Cardiology.

[9]  J. Hinde,et al.  Models for diagnosing chest pain: is CART helpful? , 1997, Statistics in medicine.

[10]  D A Bloch,et al.  Recursive partitioning for the identification of disease risk subgroups: a case-control study of subarachnoid hemorrhage. , 1998, Journal of clinical epidemiology.

[11]  J. Torner,et al.  Risk classification after aneurysmal subarachnoid hemorrhage. , 1998, Surgical neurology.

[12]  William J. Long,et al.  Using Classification Tree and Logistic Regression Methods to Diagnose Myocardial Infarction , 1998, MedInfo.

[13]  Peggo K. W. Lam,et al.  Derivation of a prediction rule for post-traumatic acute lung injury. , 1999, Resuscitation.

[14]  K. Varaklis,et al.  Predictive value of history and physical examination in patients with suspected ectopic pregnancy. , 1999, Annals of emergency medicine.

[15]  K R Hess,et al.  Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma. , 1999, Clinical cancer research : an official journal of the American Association for Cancer Research.

[16]  J. Monahan,et al.  A Classification Tree Approach to the Development of Actuarial Violence Risk Assessment Tools , 2000, Law and human behavior.