An Ensemble Pruning Approach Based on Reinforcement Learning in Presence of Multi-class Imbalanced Data

In recent years, learning from imbalanced data sets has become a challenging issue in machine learning and data mining communities. This problem occurs when some classes of data have smaller number of instances than other classes. Multi-class imbalanced data sets have been pervasively observed in many real world applications. Many typical machine learning algorithms pose many difficulties dealing with these kinds of data sets. In this paper, we proposed an ensemble pruning approach which is based on Reinforcement Learning framework. In effect, we were inspired by Markov Decision Process and considered the ensemble pruning problem as a one player game, and select the best classifiers among our initial state space. These selected classifiers which can produce a good ensemble model, are employed to learn from multi-class imbalanced data sets. Our experimental results on some UCI and KEEL benchmark data sets show promising improvements in terms of minority class recall, G-mean, and MAUC.

[1]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  C. J. Whitaker,et al.  Ten measures of diversity in classifier ensembles: limits for two classifiers , 2001 .

[3]  T. Warren Liao,et al.  Classification of weld flaws with imbalanced class data , 2008, Expert Syst. Appl..

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Yang Wang,et al.  Boosting for Learning Multiple Classes with Imbalanced Class Distribution , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  María José del Jesús,et al.  Multi-class Imbalanced Data-Sets with Linguistic Fuzzy Rule Based Classification Systems Based on Pairwise Learning , 2010, IPMU.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[15]  Ali Hamzeh,et al.  DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets , 2012, Data Knowl. Eng..

[16]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[17]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[18]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Ali Hamzeh,et al.  A game theoretic framework for feature selection , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[20]  Xin Yao,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Relationships between Diversity of Classification Ensembles and Single-class Performance Measures , 2022 .