Dealing with Unknown Priors in Supervised Classification

In this work, we examine minimum expected error-rate and minimum expected cost decision making in the presence of uncertainty about the class priors. More precisely, we train a classifier on a training set, and, once the parameters are estimated, we apply this classifier on new real-world data that have to be labeled. We thus examine the situation in which the a priori probabilities of the classes (priors) in the real-world data set are unknown and are suspected to be different from those encountered in the training set, while the within-class densities remain unchanged (new sampling conditions). This problem is known as the “unbalanced data set problem” in the machine learning community. Various scenarios are considered: (1) the priors of the training set and the real-world data set are the same (simple Bayesian decision making), (2) the priors of the training set and the real-world data set are different but we know these new priors, (3) the priors of the training set and the real-world data set are different, we do not have any knowledge about these new priors, but we can estimate them on the new data set, (4) the priors of the training set and the real-world data set are different and we do not have access to the real-world data set, so that no estimate of the priors can be computed. All these cases are discussed from a decision-making point of view, aiming to optimize the classification results in the new sampling conditions. In particular, we show that when no information at all is available about the sampling conditions (the priors) on which the classification model will be applied, the optimal decision rule is based on the likelihood, that is, equal priors for all classes. This justifies the “rule of thumb” that is usually applied in this situation: to train the classifier with equal proportions of observations from each class. Marco Saerens (the corresponding author) and Nathalie Souchon are with the ISYS Unit (Information Systems Research Unit), IAG, Universite catholique de Louvain, Place des Doyens 1, B-1348 Louvain-laNeuve, Belgium. Email: {saerens, souchon}@isys.ucl.ac.be. Jean-Michel Renders is with the Xerox Research Center Europe, Chemin deMaupertuis 6, 38240 Meylan (Grenoble), France. Email: jean-michel.renders@xrce.xerox.com. Christine Decaestecker is a Senior Research Assistant of the F.N.R.S. and is with the Laboratory of Toxicology, Institute of Pharmacy, Universite Libre de Bruxelles, Campus Plaine CP 205/1, Boulevard du Triomphe, B-1050 Bruxelles, Belgium. Email: cdecaes@ulb.ac.be.

[1]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[2]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[3]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[4]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[5]  Marco Saerens,et al.  Any reasonable cost function can be used for a posteriori probability approximation , 2002, IEEE Trans. Neural Networks.

[6]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities May Significantly Improve Classification Accuracy: Evidence from a multi-class problem in remote sensing , 2001, ICML.

[7]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[8]  Jair Minoro Abe,et al.  Advances in Logic, Artificial Intelligence and Robotics: Laptec 2002 , 2002 .

[9]  Ralph Martinez,et al.  Reduction Techniques for Exemplar-Based Learning Algorithms , 1998 .

[10]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[13]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[14]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[15]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[16]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[17]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[18]  Geoffrey G. Towell,et al.  Using Unlabeled Data for Supervised Learning , 1995, NIPS.

[19]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.