Measure oriented training: a targeted approach to imbalanced classification problems

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.

[1]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[2]  Beatrice Lazzerini,et al.  Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets , 2010, Soft Comput..

[3]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[4]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[5]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[8]  Bo Yuan,et al.  A Predictive Model for Identifying Possible MCI to AD Conversions in the ADNI Database , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[9]  Wenhuang Liu,et al.  Rare Class Mining: Progress and Prospect , 2009, 2009 Chinese Conference on Pattern Recognition.

[10]  Mark Johnston,et al.  Multi-Objective Genetic Programming for Classification with Unbalanced Data , 2009, Australasian Conference on Artificial Intelligence.

[11]  Nitesh V. Chawla,et al.  Generating Diverse Ensembles to Counter the Problem of Class Imbalance , 2010, PAKDD.

[12]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[13]  X. Yao Evolving Artificial Neural Networks , 1999 .

[14]  Inés María Galván,et al.  Using Evolutionary Multiobjective Techniques for Imbalanced Classification Data , 2010, ICANN.

[15]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[16]  Wenhuang Liu,et al.  A Measure Oriented Training Scheme for Imbalanced Classification Problems , 2011, PAKDD Workshops.

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[19]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[20]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[21]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[23]  Thomas F. Coleman,et al.  Large-Scale Numerical Optimization , 1990 .