Classification of imbalanced data sets using Multi Objective Genetic Programming

Classification of imbalanced data set is a challenging problem as it is very difficult to achieve good classification accuracy for each class in case of imbalanced data sets. This problem arises in many real world applications like medical diagnosis of rare medical disease, fraud detection in financial domain, and faulty area detection in network troubleshooting etc. The imbalanced data set consists of small number of instances of minority classes and large number of instances of majority classes. Overall classification accuracy is computed by taking the ratio of correctly classified instances to total number of instances in a data set. For imbalanced data sets, correct classification of minority class instances contribute minimum in improvement of overall classification accuracy as compared to classification of majority class instances. Conventional classification techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM), and Naïve Bayes (NB) consider overall classification accuracy of the classifier only and thus evolve biased classifiers in case of imbalanced data set. However, instances of minority classes may contain rare but important information in many real world data sets. Thus, a classification technique that provides good classification accuracy on both minority and majority classes is needed. This paper proposes a combination of Multi Objective Genetic Programming (MOGP) and probability based Gaussian classifier for classification of imbalanced data set. MOGP considers classification accuracy of each class as separate objective and not the overall accuracy as single objective. Gaussian classifier is generative classifier in which distribution of one class never affect the classification of instances of other classes. The proposed methodology is applied on classification of imbalanced data sets from medical, life science, automobile, and space science domain. The results suggest that MOGP classifier outperformed other conventional classifiers (ANN, SVM, and NB) on tested imbalanced data sets.

[1]  Gary B. Lamont,et al.  Multiobjective evolutionary algorithms: classifications, analyses, and new innovations , 1999 .

[2]  Ilkay Ulusoy,et al.  Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Vipul K. Dabhi,et al.  An improved SPEA2 Multi objective algorithm with non dominated elitism and Generational Crossover , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[4]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[5]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[6]  Mark Johnston,et al.  Multi-Objective Genetic Programming for Classification with Unbalanced Data , 2009, Australasian Conference on Artificial Intelligence.

[7]  ABDUL RAUF BAIG,et al.  Review of Classification Using Genetic Programming , 2010 .

[8]  Abdul Rauf Baig,et al.  CLONAL-GP Framework for Artificial Immune System Inspired Genetic Programming for Classification , 2010, KES.

[9]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[10]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[11]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[12]  Nikhil R. Pal,et al.  A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[13]  Pascal Bouvry,et al.  Multiobjective classification with moGEP: an application in the network traffic domain , 2009, GECCO.

[14]  Sanjay Chaudhary,et al.  A Survey on Techniques of Improving Generalization Ability of Genetic Programming Solutions , 2012, ArXiv.

[15]  Mark Johnston,et al.  Genetic programming for image classification with unbalanced data , 2009, 2009 24th International Conference Image and Vision Computing New Zealand.

[16]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[17]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[18]  Pooja Mittal Knowledge Extraction based on Evolutionary Learning (KEEL): Analysis of Development Method, Genetic Fuzzy System , 2012 .

[19]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[21]  Mengjie Zhang,et al.  Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification , 2006, Pattern Recognit. Lett..

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Mark Johnston,et al.  Genetic Programming for Classification with Unbalanced Data , 2010, EuroGP.

[24]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .