A Novel Class Imbalance Learning Method using Neural Networks

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). In this paper, we present a new hybrid approach using neural networks to improve the class imbalance results. This algorithm provides a simpler and faster alternative by using multi perceptron back propagation neural network as base algorithm. We conduct experiments using eleven UCI data sets from various application domains using four base learners, and five evaluation metrics. Experimental results show that our method has shown good performance in terms of Area under the ROC Curve, F-measure, precision, TP rate and TN rate values than many existing class imbalance learning methods.

[1]  James M. Rehg,et al.  Fast Asymmetric Learning for Cascade Face Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yan Wang,et al.  SVM Learning from Imbalanced Data by GA Sampling for Protein Domain Prediction , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[3]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[4]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[5]  Wei-Zhen Lu,et al.  Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme. , 2008, The Science of the total environment.

[6]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[7]  Pavel Brazdil,et al.  Cost-Sensitive Decision Trees Applied to Medical Data , 2007, DaWaK.

[8]  Hewijin Christine Jiau,et al.  Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem , 2006 .

[9]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[10]  Francisco Herrera,et al.  Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems , 2009, Appl. Soft Comput..

[11]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[12]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[13]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[16]  Kemal Kilic,et al.  Comparison of Different Strategies of Utilizing Fuzzy Clustering in Structure Identification , 2007, Inf. Sci..

[17]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[18]  Taghi M. Khoshgoftaar,et al.  Evolutionary Sampling and Software Quality Modeling of High-Assurance Systems , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[19]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[20]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[21]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[22]  Chao-Ton Su,et al.  An Evaluation of the Robustness of MTS for Imbalanced Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[24]  Xiang Peng,et al.  Robust BMPM training based on second-order cone programming and its application in medical diagnosis , 2008, Neural Networks.