An Improved Two-Step Supervised Learning Artificial Neural Network for Imbalanced Dataset Problems

An improved two-step supervised learning algorithm of Artificial Neural Networks (ANN) for imbalanced dataset problems is proposed in this paper. Particle swarm optimization (PSO) is utilized as ANN learning mechanism for first step and second step. The fitness function for both steps is Geometric Mean (G-Mean). Firstly, the best weights on network are determined with a decision threshold is set to 0.5. After the first step learning is accomplished, the best weights will be used for second step learning. The best weights with the best value of decision threshold are obtained and can be used to predict an imbalanced dataset. Haberman's Survival datasets, which is available in UCI Machine Learning Repository, is chosen as a case study. G-Mean is chosen as the evaluation method to define the classifier's performance for a case study. Consequently, the proposed approach is able to overcome imbalanced dataset problems with better G-Mean value compared to the previously proposed ANN.

[1]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[2]  Gwi-Tae Park,et al.  Sensorless Speed Control System Using a Neural Network , 2005 .

[3]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[4]  Kaizhu Huang,et al.  Learning classifiers from imbalanced data based on biased minimax probability machine , 2004, CVPR 2004.

[5]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Cheng G. Weng,et al.  A New Evaluation Measure for Imbalanced Datasets , 2008, AusDM.

[7]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Zhong-Qiu Zhao,et al.  A novel modular neural network for imbalanced classification problems , 2009, Pattern Recognit. Lett..

[9]  Yi Lu Murphey,et al.  OAHO: an Effective Algorithm for Multi-Class Learning from Imbalanced Data , 2007, 2007 International Joint Conference on Neural Networks.

[10]  Lorenzo Bruzzone,et al.  Classification of imbalanced remote-sensing data by neural networks , 1997, Pattern Recognit. Lett..

[11]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[12]  V. C. Veera Reddy,et al.  ANN FOR CLASSIFICATION OF CARDIAC ARRHYTHMIAS , 2008 .

[13]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[14]  B. Yegnanarayana,et al.  Artificial neural networks for pattern recognition , 1994 .

[15]  Wenjun Zhang,et al.  Function Approximation and Documentation of Sampling Data Using Artificial Neural Networks , 2006, Environmental Monitoring & Assessment.

[16]  Michael R. Lyu,et al.  A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training , 2007, Appl. Math. Comput..

[17]  José Martínez Sotoca,et al.  Improving the Classification Accuracy of RBF and MLP Neural Networks Trained with Imbalanced Samples , 2006, IDEAL.

[18]  Yahya H. Zweiri,et al.  A three-term backpropagation algorithm , 2003, Neurocomputing.

[19]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[20]  Abdesselam Bouzerdoum,et al.  A supervised learning approach for imbalanced data sets , 2008, 2008 19th International Conference on Pattern Recognition.

[21]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[22]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .