Introduction to Classification Algorithms and Their Performance Analysis Using Medical Examples

In this chapter, we give an introduction to classification algorithms and the metrics that are used to quantify and visualize their performance. We first briefly explain what we mean with a classification algorithm, and, as an example, we describe in more detail the naive Bayesian classification algorithm. Using the concept of a confusion matrix, we next define the various performance metrics that can be derived from it, including sensitivity and specificity that define the two dimensions of ROC space. We next argue that correctly evaluating the performance of a classification algorithm requires taking into account the conditions in which the algorithm has to operate in practice. These so-called operating conditions consist of two elements: class skew and cost skew. We show that both elements can be combined into a single parameter that defines cost, and that iso-cost curves are straight lines in ROC space.

[1]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[2]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[5]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[6]  Peter A. Flach,et al.  Rate-Oriented Point-Wise Confidence Bounds for ROC Curves , 2014, ECML/PKDD.

[7]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[8]  T. Louis,et al.  Empirical Bayes Confidence Intervals Based on Bootstrap Samples , 1987 .

[9]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[10]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[11]  Peter A. Flach,et al.  Brier Curves: a New Cost-Based Visualisation of Classifier Performance , 2011, ICML.

[12]  José Hernández-Orallo,et al.  ROC curves for regression , 2013, Pattern Recognit..

[13]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[14]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[15]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[16]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[17]  Adolf Proidl,et al.  Incorporating user control into recommender systems based on naive bayesian classification , 2007, RecSys '07.

[18]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[19]  Peter A. Flach,et al.  Rate-Constrained Ranking and the Rate-Weighted AUC , 2014, ECML/PKDD.

[20]  Verus Pronk,et al.  Incorporating Confidence in a Naive Bayesian Classifier , 2005, User Modeling.

[21]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  Lori E. Dodd,et al.  Partial AUC Estimation and Regression , 2003, Biometrics.

[24]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[25]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[26]  Sandy L. Zabell,et al.  The rule of succession , 1989 .

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[29]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[30]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[31]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[32]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[33]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[34]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[35]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[36]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[37]  L. J. Savage,et al.  Probability and the weighing of evidence , 1951 .

[38]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[39]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..