Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds

The presence of asymmetry in the misclassification costs or class prevalences is a common occurrence in the pattern classification domain. While much interest has been devoted to the study of cost-sensitive learning techniques, the relationship between cost-sensitive learning and the specification of the model set in a parametric estimation framework remains somewhat unclear. To that end, we differentiate between the case of the model including the true posterior, and that in which the model is misspecified. In the former case, it is shown that thresholding the maximum likelihood (ML) estimate is an asymptotically optimal solution to the risk minimization problem. On the other hand, under model misspecification, it is demonstrated that thresholded ML is suboptimal and that the risk-minimizing solution varies with the misclassification cost ratio. Moreover, we analytically show that the negative weighted log likelihood (Elkan, 2001) is a tight, convex upper bound of the empirical loss. Coupled with empirical results on several real-world data sets, we argue that weighted ML is the preferred cost-sensitive technique.

[1]  Miroslav Dudík,et al.  Generative and Discriminative Learning with Unknown Labeling Bias , 2008, NIPS.

[2]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[3]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[4]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[5]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[6]  R. Fletcher Practical Methods of Optimization , 1988 .

[7]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[8]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[9]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[10]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[11]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[12]  Lucas C. Parra,et al.  Recipes for the linear analysis of EEG , 2005, NeuroImage.

[13]  P. Sajda,et al.  Spatiotemporal Linear Decoding of Brain State , 2008, IEEE Signal Processing Magazine.

[14]  J. Horowitz A Smoothed Maximum Score Estimator for the Binary Response Model , 1992 .

[15]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[16]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[17]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[18]  David G. Stork,et al.  Pattern Classification , 1973 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[21]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  Jesús Cid-Sueiro,et al.  Local estimation of posterior class probabilities to minimize classification errors , 2004, IEEE Transactions on Neural Networks.

[24]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[25]  L. Parra,et al.  Ieee Signal Processing Magazine, Accepted for Publication, August 2007 Spatio-temporal Linear Decoding of Brain State: Application to Performance Augmentation in High-throughput Tasks , 2022 .

[26]  Halbert White,et al.  The construction of empirical credit scoring rules based on maximization principles , 2010 .

[27]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[28]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[29]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .