A Survey on Graphical Methods for Classification Predictive Performance Evaluation

Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.

[1]  Moisés Goldszmidt,et al.  Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[4]  Thomas M. Hamill,et al.  Reliability Diagrams for Multicategory Probabilistic Forecasts , 1997 .

[5]  Nathalie Japkowicz,et al.  Warning: statistical benchmarking is addictive. Kicking the habit in machine learning , 2010, J. Exp. Theor. Artif. Intell..

[6]  Kar-Ann Toh,et al.  Between Classification-Error Approximation and Weighted Least-Squares Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Chris. Drummond,et al.  Machine Learning as an Experimental Science ( Revisited ) ∗ , 2006 .

[8]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  D. Mossman Three-way ROCs , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  Nathalie Japkowicz,et al.  Workshop summary: The fourth workshop on evaluation methods for machine learning , 2009, International Conference on Machine Learning.

[12]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[13]  Pat Langley,et al.  Machine learning as an experimental science , 2004, Machine Learning.

[14]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[15]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[16]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[17]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[20]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[21]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[22]  W. Briggs Statistical Methods in the Atmospheric Sciences , 2007 .

[23]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[24]  Luís Torgo,et al.  Regression Using Classification Algorithms , 1997, Intell. Data Anal..

[25]  A. H. Murphy,et al.  The attributes diagram A geometrical framework for assessing the quality of probability forecasts , 1986 .

[26]  Gustavo E. A. P. A. Batista,et al.  Graphical methods for classifier performance evaluation , 2003 .

[27]  Aleix M. Martínez,et al.  Where are linear feature extraction methods applicable? , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Robert P. W. Duin,et al.  Efficient Multiclass ROC Approximation by Decomposition via Confusion Matrix Perturbation Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[31]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[32]  Gary M. Weiss,et al.  Quantification and semi-supervised classification methods for handling changes in class distribution , 2009, KDD.

[33]  Piew Datta Bisiness Focused Evaluation Methods: A Case Study , 1999, PKDD.

[34]  Bianca Zadrozny,et al.  Ranking-based evaluation of regression models , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[35]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[36]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[37]  Aleix M. Martínez,et al.  Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .