Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

The paper addresses problems of improving performance of rule-based classifiers constructed from imbalanced data sets, i.e., data sets where the minority class of primary importance is under-represented in comparison to majority classes. We introduced two techniques to detect and process inconsistent examples from the majority classes in the boundary between the minority and majority classes. Both these techniques differ in the way of processing inconsistent boundary examples from the majority classes. The first approach removes them, while the other relabels them as belonging to the minority class. The experiments showed that the best results were obtained for the filtering technique, where inconsistent majority class examples were reassigned to the minority class, combined with a classifier composed of decision rules generated by the MODLEM algorithm.

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[3]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[4]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[5]  Szymon Wilk,et al.  Evaluating business credit risk by means of approach-integrating decision rules and case-based learning , 2001, Intell. Syst. Account. Finance Manag..

[6]  Oren Etzioni,et al.  Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[7]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[8]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[11]  Jerzy W. Grzymala-Busse,et al.  Increasing sensitivity of preterm birth by changing rule strengths , 2003, Pattern Recognit. Lett..

[12]  J. Stefanowski,et al.  Induction of decision rules in classification and discovery‐oriented perspectives , 2001 .

[13]  Jerzy Stefanowski,et al.  Application of Rule Induction and Rough Sets to Verification of Magnetic Resonance Diagnosis , 2002, Fundam. Informaticae.

[14]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[15]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[16]  Wojtek Michalowski,et al.  Supporting triage of children with abdominal pain in the emergency room , 2005, Eur. J. Oper. Res..

[17]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[18]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[19]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Two Approaches to Data Mining from Imbalanced Data , 2004, J. Intell. Manuf..

[20]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[21]  Jerzy W. Grzymala-Busse,et al.  An Approach to Imbalanced Data Sets Based on Changing Rule Strength , 2004, Rough-Neural Computing: Techniques for Computing with Words.