A new framework for optimal classifier design

The use of alternative measures to evaluate classifier performance is gaining attention, specially for imbalanced problems. However, the use of these measures in the classifier design process is still unsolved. In this work we propose a classifier designed specifically to optimize one of these alternative measures, namely, the so-called F-measure. Nevertheless, the technique is general, and it can be used to optimize other evaluation measures. An algorithm to train the novel classifier is proposed, and the numerical scheme is tested with several databases, showing the optimality and robustness of the presented classifier.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[3]  Guillermo Sapiro,et al.  Automatic colon polyp flagging via geometric and texture features , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[4]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[5]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[6]  Joshua Alspector,et al.  Data duplication: an imbalance problem ? , 2003 .

[7]  Sieh Kiong Tiong,et al.  Nontechnical Loss Detection for Metered Customers in Power Utility Using Support Vector Machines , 2010, IEEE Transactions on Power Delivery.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Nuno Vasconcelos,et al.  Asymmetric boosting , 2007, ICML '07.

[10]  María José del Jesús,et al.  Cost Sensitive and Preprocessing for Classification with Imbalanced Data-sets: Similar Behaviour and Potential Hybridizations , 2012, ICPRAM.

[11]  R. A. Mollineda,et al.  The class imbalance problem in pattern classification and learning , 2009 .

[12]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[13]  José Salvador Sánchez,et al.  On the Suitability of Numerical Performance Measures for Class Imbalance Problems , 2012, ICPRAM.

[14]  P. Musé Segmentation and polyp detection in virtual colonoscopy : a complete system for computer aided diagnosis , 2012 .

[15]  Alicia Fernández,et al.  Improving Electric Fraud Detection using Class Imbalance Strategies , 2012, ICPRAM.

[16]  G. Sapiro,et al.  Geometric partial differential equations and image analysis [Book Reviews] , 2001, IEEE Transactions on Medical Imaging.

[17]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[18]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[19]  G. Pflug Kernel Smoothing. Monographs on Statistics and Applied Probability - M. P. Wand; M. C. Jones. , 1996 .

[20]  José Martínez Sotoca,et al.  Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification , 2006, IDEAL.

[21]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[22]  J. Sethian,et al.  Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations , 1988 .

[23]  Guillermo Sapiro,et al.  A Complete System for Candidate Polyps Detection in Virtual Colonoscopy , 2012, Int. J. Pattern Recognit. Artif. Intell..

[24]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[25]  S. Osher,et al.  A level set approach for computing solutions to incompressible two-phase flow , 1994 .

[26]  V. Raykar,et al.  Fast Computation of Kernel Estimators , 2010 .

[27]  Ricardo Tanscheit,et al.  A Neuro-fuzzy System for Fraud Detection in Electricity Distribution , 2009, IFSA/EUSFLAT Conf..

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .