Analysis of Multi-Criteria Methods for Classifier and Algorithm Evaluation

Evaluation of classifiers and supervised learning algorithms has historically been done by estimating predictive accuracy via cross-validation tests or similar methods. More recently, ROC analysis has been shown to be a good alternative. However, the characteristics can vary greatly for different domains and it has been shown that different evaluation methods are appropriate for different problems. Moreover, for many problems, methods that combine two or more criteria have been successful. We introduce the software engineering concepts of quality attributes and metrics to shift focus from the question of which evaluation method should be used to which quality attributes need to be assessed for a particular application. We analyze a large number of metrics and categorize them according to which attribute(s) they address. Finally, a structured way to evaluate classifiers and supervised learning algorithms is proposed, as well as a generic multi-criteria-based metric that can be used to combine arbitrary metrics.

[1]  Paul Davidsson,et al.  Evaluating learning algorithms and classifiers , 2007, Int. J. Intell. Inf. Database Syst..

[2]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Charles X. Ling,et al.  Evaluating Model Selection Abilities of Performance Measures , 2006 .

[5]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[6]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[7]  D. Wolpert OFF-TRAINING SET ERROR AND A PRIORI DISTINCTIONS BETWEEN LEARNING ALGORITHMS , 1994 .

[8]  Paul Davidsson,et al.  Measure-based classifier performance evaluation , 1999, Pattern Recognit. Lett..

[9]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[10]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[11]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[12]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[13]  Paul Davidsson,et al.  Quantifying the Impact of Learning Algorithm Parameter Tuning , 2006, AAAI.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[16]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[18]  N. Lavesson,et al.  A multi-dimensional measure function for classifier performance , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[19]  Henrik Boström,et al.  Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning , 2005, ICML 2005.

[20]  Alex A. Freitas,et al.  Are we really discovering ''interesting'' knowledge from data? , 2006 .

[21]  Raymond Nadeau,et al.  Evolution of multi‐criteria analysis: a scientometric analysis , 1999 .

[22]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[23]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..