Example-dependent cost-sensitive adaptive boosting

Abstract Intelligent computer systems aim to help humans in making decisions. Many practical decision-making problems are classification problems in their nature, but standard classification algorithms often not applicable since they assume balanced distribution of classes and constant misclassification costs. From this point of view, algorithms that consider the cost of decisions are essential since they are more consistent with the requirements of real life. These algorithms generate decisions that directly optimize parameters valuable for business, for example, the costs savings. But despite on practical value of cost-sensitive algorithms, the little number of works study this problem concentrating mainly on the case when the cost of a classifier error is constant and does not depend on a specific example. However, many real-world classification tasks are example-dependent cost-sensitive (ECS), where the costs of misclassification vary between examples and not only within classes. Existing methods of ECS learning include just modifications of the simplest models of machine learning (naive Bayes, logistic regression, decision tree). These models produce promising results, but there is a need for further improvement in performance that can be achieved by using gradient-based ensemble methods. To break this gap, we present the ECS generalization of AdaBoost. We study three models which differ by the ways to introduce cost into the loss function: inside the exponent, outside the exponent, and both inside and outside the exponent. The results of the experiments on three synthetic and two real datasets (bank marketing and insurance fraud) show that example-dependent cost-sensitive modifications of AdaBoost outperform other known models. Empirical results also show that critical factors influencing the choice of the model are not only the distribution of features, which is typical for cost-insensitive and class-dependent cost-sensitive problems but also the distribution of costs. Next, since the outputs of AdaBoost are not well calibrated posterior probabilities, we check three approaches to calibration of classifier scores: Platt scaling, isotonic regression, and ROC modification. The results show that calibration not only significantly improves the performance of specific ECS models but allows making better capabilities of original AdaBoost. Obtained results provide new insight regarding the behavior of the cost-sensitive model from a theoretical point of view and prove that the presented approach can significantly improve the practical design of intelligent systems.

[1]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[2]  Ulf Brefeld,et al.  Support Vector Machines with Example Dependent Costs , 2003, ECML.

[3]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[4]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[5]  José Luis Alba-Castro,et al.  Shedding light on the asymmetric learning capability of AdaBoost , 2012, Pattern Recognit. Lett..

[6]  Gavin Brown,et al.  Calibrating AdaBoost for Asymmetric Learning , 2015, MCS.

[7]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  Theoden Netoff,et al.  Seizure prediction with spectral power of EEG using cost‐sensitive support vector machines , 2011, Epilepsia.

[9]  Yufei Xia,et al.  Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending , 2017, Electron. Commer. Res. Appl..

[10]  Peter A. Flach,et al.  Cost-sensitive boosting algorithms: Do we really need them? , 2016, Machine Learning.

[11]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[12]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[13]  Gerald Schaefer,et al.  Cost-sensitive decision tree ensembles for effective imbalanced classification , 2014, Appl. Soft Comput..

[14]  Ekrem Duman,et al.  A cost-sensitive decision tree approach for fraud detection , 2013, Expert Syst. Appl..

[15]  Francisco Javier García Castellano,et al.  Expert Systems With Applications , 2022 .

[16]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[17]  Björn E. Ottersten,et al.  Improving Credit Card Fraud Detection with Calibrated Probabilities , 2014, SDM.

[18]  Zdzislaw Piasta,et al.  Rough Classifiers Sensitive to Costs Varying from Object to Object , 1998, Rough Sets and Current Trends in Computing.

[19]  Gabriel J. Brostow,et al.  Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Björn E. Ottersten,et al.  Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring , 2014, 2014 13th International Conference on Machine Learning and Applications.

[21]  Anazida Zainal,et al.  Fraud detection system: A survey , 2016, J. Netw. Comput. Appl..

[22]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[23]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[24]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[27]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[28]  Rich Caruana,et al.  Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[29]  Nuno Vasconcelos,et al.  Asymmetric boosting , 2007, ICML '07.

[30]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[31]  Björn E. Ottersten,et al.  Example-dependent cost-sensitive decision trees , 2015, Expert Syst. Appl..

[32]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[33]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[34]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[35]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .