Cost-sensitive meta-learning framework

Purpose This paper aims to describe the use of a meta-learning framework for recommending cost-sensitive classification methods with the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” Design/methodology/approach This paper describes the use of a meta-learning framework for recommending cost-sensitive classification methods for the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” The framework is based on the idea of applying machine learning techniques to discover knowledge about the performance of different machine learning algorithms. It includes components that repeatedly apply different classification methods on data sets and measures their performance. The characteristics of the data sets, combined with the algorithms and the performance provide the training examples. A decision tree algorithm is applied to the training examples to induce the knowledge, which can then be used to recommend algorithms for new data sets. The paper makes a contribution to both meta-learning and cost-sensitive machine learning approaches. Those both fields are not new, however, building a recommender that recommends the optimal case-sensitive approach for a given data problem is the contribution. The proposed solution is implemented in WEKA and evaluated by applying it on different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. The developed solution takes the misclassification cost into consideration during the learning process, which is not available in the compared project. Findings The proposed solution is implemented in WEKA and evaluated by applying it to different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. Originality/value The paper presents a major piece of new information in writing for the first time. Meta-learning work has been done before but this paper presents a new meta-learning framework that is costs sensitive.

[1]  Bo Zhang,et al.  An instance-based learning recommendation algorithm of imbalance handling methods , 2019, Appl. Math. Comput..

[2]  Sung Wook Baik,et al.  A Hybrid Approach Using Oversampling Technique and Cost-Sensitive Learning for Bankruptcy Prediction , 2019, Complex..

[3]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Noise detection in the meta-learning level , 2016, Neurocomputing.

[4]  Stefan Lessmann,et al.  Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach , 2020, Eur. J. Oper. Res..

[5]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Metalearning and Recommender Systems: A literature review and empirical study on the algorithm selection problem for Collaborative Filtering , 2018, Inf. Sci..

[6]  Yuri Zelenkov,et al.  Example-dependent cost-sensitive adaptive boosting , 2019, Expert Syst. Appl..

[7]  Jafar Tanha,et al.  Boosting methods for multi-class imbalanced data classification: an experimental review , 2020, J. Big Data.

[8]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[9]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[10]  Larry A. Rendell,et al.  Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-Based Concept Complexity , 1993, IJCAI.

[11]  Sangeeta Mittal,et al.  Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning , 2019, Lecture Notes in Electrical Engineering.

[12]  Elena Kochkina,et al.  Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data , 2020, EMNLP.

[13]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.

[14]  Ming Tan,et al.  Cost-Sensitive Concept Learning of Sensor Use in Approach ad Recognition , 1989, ML.

[15]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[18]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[19]  Bang An,et al.  Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis , 2020, Comput. Ind. Eng..

[20]  Francisco Herrera,et al.  Cost-Sensitive Learning , 2018 .

[21]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[22]  Murchhana Tripathy,et al.  A Study of Algorithm Selection in Data Mining using Meta-Learning , 2017 .

[23]  Kilian Q. Weinberger,et al.  Revisiting Meta-Learning as Supervised Learning , 2020, ArXiv.

[24]  Xiaodong Wang,et al.  Fault Diagnosis Method of Check Valve Based on Multikernel Cost-Sensitive Extreme Learning Machine , 2017, Complex..

[25]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[26]  Kuan-Ching Li,et al.  Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification , 2020, IEEE Access.

[27]  Marcílio Carlos Pereira de Souto,et al.  Selecting Machine Learning Algorithms Using the Ranking Meta-Learning Approach , 2011, Meta-Learning in Computational Intelligence.

[28]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[29]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[30]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[31]  Lior Wolf,et al.  Meta Decision Trees for Explainable Recommendation Systems , 2019, AIES.

[32]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[33]  Fadi Thabtah,et al.  Data imbalance in classification: Experimental evaluation , 2020, Inf. Sci..

[34]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[35]  Gary Weiss,et al.  Does cost-sensitive learning beat sampling for classifying rare classes? , 2005, UBDM '05.

[36]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[37]  Jonathan M. Garibaldi,et al.  A Novel Meta Learning Framework for Feature Selection using Data Synthesis and Fuzzy Similarity , 2020, 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[38]  Isaac Kofi Nti,et al.  A comprehensive evaluation of ensemble learning for stock-market prediction , 2020, Journal of Big Data.

[39]  Edesio Alcobaça,et al.  MFE: Towards reproducible meta-feature extraction , 2020, J. Mach. Learn. Res..

[40]  Qinbao Song,et al.  Automatic recommendation of classification algorithms based on data set characteristics , 2012, Pattern Recognit..

[41]  Chi-Hyuck Jun,et al.  Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification , 2017, Inf. Sci..

[42]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[43]  Sami Ben Jabeur,et al.  Machine learning models and cost-sensitive decision trees for bond rating prediction , 2020, J. Oper. Res. Soc..

[44]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[45]  Joaquin Vanschoren,et al.  Speeding up algorithm selection using average ranking and active testing by introducing runtime , 2018, Machine Learning.

[46]  Antonio González Muñoz,et al.  On the use of meta-learning for instance selection: An architecture and an experimental study , 2014, Inf. Sci..

[47]  Yuming Zhou,et al.  A Feature Subset Selection Algorithm Automatic Recommendation Method , 2013, J. Artif. Intell. Res..

[48]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[49]  Yanli Wu,et al.  Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping , 2020 .

[50]  Han Yan,et al.  Predicting duration of traffic accidents based on cost-sensitive Bayesian network and weighted K-nearest neighbor , 2019, J. Intell. Transp. Syst..

[51]  Howard Cho,et al.  Empirical Learning as a Function of Concept Character , 1990, Machine Learning.

[52]  Ernesto Damiani,et al.  Using Meta-learning to Recommend Process Discovery Methods , 2021, ArXiv.

[53]  Andreas Dengel,et al.  Automatic classifier selection for non-experts , 2012, Pattern Analysis and Applications.

[54]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[55]  Michael J. Siers,et al.  Class Imbalance and Cost-Sensitive Decision Trees , 2020, ACM Trans. Knowl. Discov. Data.

[56]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[57]  Husanbir Singh Pannu,et al.  A Systematic Review on Imbalanced Data Challenges in Machine Learning , 2019, ACM Comput. Surv..

[58]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[59]  Marlon Núñez,et al.  The Use of Background Knowledge in Decision Tree Induction , 1991, Machine Learning.

[60]  Sunil Vadera,et al.  Feature selection in meta learning framework , 2014, 2014 Science and Information Conference.

[61]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data , 2014, Neurocomputing.

[62]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[63]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[64]  Hong Zhao,et al.  A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism , 2017, Inf. Sci..

[65]  Xavier Pennec,et al.  geomstats: a Python Package for Riemannian Geometry in Machine Learning , 2018, ArXiv.

[66]  Yufei Xia,et al.  Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending , 2017, Electron. Commer. Res. Appl..

[67]  W. Verbeke,et al.  Cost-sensitive learning for profit-driven credit scoring , 2020, J. Oper. Res. Soc..

[68]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[69]  Arif I. Sarwat,et al.  A Survey on Modality Characteristics, Performance Evaluation Metrics, and Security for Traditional and Wearable Biometric Systems , 2019, ACM Comput. Surv..

[70]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[71]  Shichao Zhang,et al.  Cost-sensitive KNN classification , 2020, Neurocomputing.

[72]  Huimin Peng,et al.  A Comprehensive Overview and Survey of Recent Advances in Meta-Learning , 2020, ArXiv.

[73]  Debashree Devi,et al.  Correlation-based Oversampling aided Cost Sensitive Ensemble learning technique for Treatment of Class Imbalance , 2021, J. Exp. Theor. Artif. Intell..

[74]  Rabindra K. Barik,et al.  Modified Decision Tree Learning for Cost-Sensitive Credit Card Fraud Detection Model , 2020 .

[75]  Sk. Maliha Mehnaz,et al.  Prediction of diabetes using cost sensitive learning and oversampling techniques on Bangladeshi and Indian female patients , 2020, 2020 5th International Conference on Information Technology Research (ICITR).

[76]  Rui Liu,et al.  Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification , 2019, Inf. Sci..

[77]  Iain Paterson,et al.  Evaluation of Machine-Learning Algorithm Ranking Advisors , 2000 .

[78]  Xianchao Zhang,et al.  A Literature Survey and Empirical Study of Meta-Learning for Classifier Selection , 2020, IEEE Access.

[79]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[80]  Sung Wook Baik,et al.  Oversampling Techniques for Bankruptcy Prediction: Novel Features from a Transaction Dataset , 2018, Symmetry.

[81]  Witold Pedrycz,et al.  Cost-Sensitive Weighting and Imbalance-Reversed Bagging for Streaming Imbalanced and Concept Drifting in Electricity Pricing Classification , 2019, IEEE Transactions on Industrial Informatics.

[82]  Xiqing Cui,et al.  Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy , 2017, Cognition, Technology & Work.