An Improved Algorithm for CART Based on the Rough Set Theory

Data prediction and classification is a critical method in medical nutrition data analysis area. As for the characteristics of being intuitive, efficient and easy to understand, the decision tree algorithm is widely used in this field. However, the classification rules extracted from the decision tree are not the most simple and efficient. The paper analyzes the classical decision tree algorithm CART, and proposes a new improved algorithm R2-CART. The core idea of the advanced algorithm is, in order to simplify the classification rules and tree, combining CART algorithm with rough set theory to conduct the attribute and rule reduction on the classification rules of decision tree. The experiment, which compares the Original CART algorithm with the improved algorithm, shows that the improved algorithm has much better classification efficiency with achieving a simple and efficient classification rule set at the same time. This improved algorithm has a potential practical value for large-scale medical nutrition data of classification and predictive analysis.

[1]  Patrick Royston,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure. , 2005, JAMA.

[2]  W John Boscardin,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis. , 2005, JAMA.

[3]  Zdzislaw Pawlak Theorize with data using rough sets , 2002, Proceedings 26th Annual International Computer Software and Applications.

[4]  Peter C Austin,et al.  A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality , 2007, Statistics in medicine.

[5]  John R. Stevens,et al.  Tree-Based Methods , 2009 .

[6]  Alan H. Fielding,et al.  Machine Learning Methods for Ecological Applications , 2012, Springer US.

[7]  Zdzisław Pawlak,et al.  Rough set theory for intelligent industrial applications , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[8]  Aboul Ella Hassanien,et al.  Rough set approach for attribute reduction and rule generation: A case of patients with suspected breast cancer , 2004, J. Assoc. Inf. Sci. Technol..

[9]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[10]  P. Harper,et al.  A review and comparison of classification algorithms for medical decision making. , 2005, Health policy.

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Dorota Kuchta,et al.  Further remarks on the relation between rough and fuzzy sets , 1992 .

[13]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[14]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[15]  Zdzislaw Pawlak,et al.  Rough sets and intelligent data analysis , 2002, Inf. Sci..

[16]  Bernard C. Jiang,et al.  Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors , 2011, Expert Syst. Appl..

[17]  B. Akdağ,et al.  Determination of risk factors for hypertension through the classification tree method , 2006, Advances in therapy.