A supervised learning approach for imbalanced data sets

This paper presents a new learning approach for pattern classification applications involving imbalanced data sets. In this approach, a clustering technique is employed to resample the original training set into a smaller set of representative training exemplars, represented by weighted cluster centers and their target outputs. Based on the proposed learning approach, four training algorithms are derived for feed-forward neural networks. These algorithms are implemented and tested on three benchmark data sets. Experimental results show that with the proposed learning approach, it is possible to design networks to tackle the class imbalance problem, without compromising the overall classification performance.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Lipo Wang,et al.  Training RBF neural networks on unbalanced data , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[3]  José Martínez Sotoca,et al.  Improving the Performance of the RBF Neural Networks Trained with Imbalanced Samples , 2007, IWANN.

[4]  Hong Guo,et al.  Neural Learning from Unbalanced Data , 2004, Applied Intelligence.

[5]  Yi Lu Murphey,et al.  OAHO: an Effective Algorithm for Multi-Class Learning from Imbalanced Data , 2007, 2007 International Joint Conference on Neural Networks.

[6]  Abdesselam Bouzerdoum,et al.  Efficient supervised learning with reduced training exemplars , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  Yi Lu,et al.  Robust neural learning from unbalanced data samples , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Martin T. Hagan,et al.  Neural network design , 1995 .

[11]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[12]  G. Peter Zhang,et al.  The Effect of Misclassification Costs on Neural Network Classifiers , 1999 .

[13]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[14]  R. G. Oderwald,et al.  Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. , 1983 .

[15]  Kihoon Yoon,et al.  A data reduction approach for resolving the imbalanced data issue in functional genomics , 2007, Neural Computing and Applications.