Evaluation of decision trees: a multi-criteria approach

Data mining (DM) techniques are being increasingly used in many modern organizations to retrieve valuable knowledge structures from organizational databases, including data warehouses. An important knowledge structure that can result from data mining activities is the decision tree (DT) that is used for the classification of future events. The induction of the decision tree is done using a supervised knowledge discovery process in which prior knowledge regarding classes in the database is used to guide the discovery. The generation of a DT is a relatively easy task but in order to select the most appropriate DT it is necessary for the DM project team to generate and analyze a significant number of DTs based on multiple performance measures. We propose a multi-criteria decision analysis based process that would empower DM project teams to do thorough experimentation and analysis without being overwhelmed by the task of analyzing a significant number of DTs would offer a positive contribution to the DM process. We also offer some new approaches for measuring some of the performance criteria.

[1]  N. Bryson,et al.  Modelling pairwise comparisons on ratio scales , 1995 .

[2]  Noel Bryson,et al.  Generating consensus priority interval vectors for group decision‐making in the AHP , 2000 .

[3]  Gregory Piatetsky-Shapiro,et al.  Measuring lift quality in database marketing , 2000, SKDD.

[4]  Kyuseok Shim,et al.  Efficient algorithms for constructing decision trees with constraints , 2000, KDD '00.

[5]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[6]  Noel Bryson,et al.  A Goal Programming Method for Generating Priority Vectors , 1995 .

[7]  Wendy Gersten,et al.  Predictive modeling in automotive direct marketing: tools, experiences and open issues , 2000, KDD '00.

[8]  D. M. Deighton,et al.  Computers in Operations Research , 1977, Aust. Comput. J..

[9]  Michael J. A. Berry,et al.  Mastering Data Mining: The Art and Science of Customer Relationship Management , 1999 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Ivan Bratko,et al.  Trading Accuracy for Simplicity in Decision Trees , 1994, Machine Learning.

[12]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[14]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[15]  Gary J. Koehler,et al.  Theory and practice of decision tree induction , 1995 .

[16]  Thomas L. Saaty,et al.  Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation , 1990 .