Classifier Technology and the Illusion of Progress

A great many tools have been developed for supervised clas- sification, ranging from early methods such as linear discriminant anal- ysis through to modern developments such as neural networks and sup- port vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illu- sion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the di!erence in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[3]  J. Gallagher,et al.  Vertebral morphometry: normative data. , 1988, Bone and mineral.

[4]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[5]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[6]  Eric Rosenberg,et al.  Quantitative Methods in Credit Management: A Survey , 1994, Oper. Res..

[7]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[8]  Robert P. W. Duin,et al.  A note on comparing classifiers , 1996, Pattern Recognit. Lett..

[9]  David J. Hand Classification and Computers: Shifting the Focus , 1996 .

[10]  Nissan Levin,et al.  Issues and problems in applying neural computing to target marketing , 1997 .

[11]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[12]  Charles P. Friedman,et al.  Evaluation Methods in Medical Informatics , 1997, Computers and Medicine.

[13]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[14]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[15]  Niall M. Adams,et al.  Defining the Goals to Optimise Data Mining Performance , 1998, KDD.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Niall M. Adams,et al.  Comparing classifiers when the misallocation costs are uncertain , 1999, Pattern Recognit..

[18]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[19]  D. Hand,et al.  Credit scoring with uncertain class definitions , 1999 .

[20]  David J. Hand,et al.  Intelligent Data Analysis and Deep Understanding , 1999 .

[21]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[22]  Niall M. Adams,et al.  Supervised Classification Problems: How to Be Both Judge and Jury , 1999, IDA.

[23]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[24]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[25]  D. Hand Modelling consumer credit risk , 2001 .

[26]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[27]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[28]  David J. Hand,et al.  Direct versus indirect credit scoring classifications , 2002, J. Oper. Res. Soc..

[29]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[30]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[31]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[32]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[33]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[34]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[35]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[36]  David J. Hand,et al.  Academic Obsessions and Classification Realities: Ignoring Practicalities in Supervised Classification , 2004 .

[37]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[38]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[39]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[40]  David J. Hand,et al.  Supervised classification and tunnel vision , 2005 .

[41]  So Young Sohn,et al.  Reject inference in credit operations based on survival analysis , 2006, Expert Syst. Appl..

[42]  Adrien Jamain A Meta-Analysis of Classification Methods , 2006 .

[43]  David J. Hand,et al.  Mining Supervised Classification Performance Studies: A Meta-Analytic Investigation , 2008, J. Classif..

[44]  D. Weiss,et al.  Costs and Payoffs in Perceptual Research , 2008 .