Exploring Feature-Level Duplications on Imbalanced Data Using Stochastic Diffusion Search

One of the computer algorithms inspired by swarm intelligence is stochastic diffusion search (SDS). SDS uses some of the processes and techniques found in swarm to solve search and optimisation problems. In this paper, a hybrid approach is proposed to deal with real-world imbalanced data. The proposed model involves oversampling the minority class, undersampling the majority class as well as optimising the parameters of the classifier, Support Vector Machine (SVM). The proposed model uses Synthetic Minority Over-sampling Technique (SMOTE) to perform the oversampling and the agents of a swarm intelligence technique, SDS, to perform an ‘informed’ undersampling on the majority classes. In addition to comparing the agents-led undersampling with random undersampling, the results are contrasted against other best known techniques on nine real-world datasets. Moreover, the behaviour of SDS agents in this context is also analysed.

[1]  Paramartha Dutta,et al.  Handbook of Research on Swarm Intelligence in Engineering , 2015 .

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Slawomir J. Nasuto,et al.  Convergence Analysis of Stochastic Diffusion Search , 1999, Parallel Algorithms Appl..

[5]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[6]  Jing Wang,et al.  Swarm Intelligence in Cellular Robotic Systems , 1993 .

[7]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[8]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[9]  Gerard T. McKee,et al.  Locating the mouth region in images of human faces , 1993, Other Conferences.

[10]  Benjamin Van Roy,et al.  Solving Data Mining Problems Through Pattern Recognition , 1997 .

[11]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[12]  Dennis W. Fife Workshop Reports , 1966 .

[13]  Howard Williams,et al.  Stochastic Diffusion Search: A Comparison of Swarm Intelligence Parameter Estimation Algorithms with RANSAC , 2014, Algorithms.

[14]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[15]  Marek Lubicz,et al.  Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients , 2014, Appl. Soft Comput..

[16]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[17]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[18]  Vaishali Ganganwar,et al.  An overview of classification algorithms for imbalanced datasets , 2012 .

[19]  Xiuzhen Zhang,et al.  A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification , 2013, PAKDD.

[20]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[21]  Zhu Guo Support Vector Machine and Its Applications to Function Approximation , 2002 .

[22]  Mohammad Majid al-Rifaie,et al.  Stochastic Diffusion Search Review , 2013, Paladyn J. Behav. Robotics.

[23]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[24]  M. M. al-Rifaie,et al.  Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions , 2016, 2016 SAI Computing Conference (SAI).

[25]  J. Bishop Stochastic searching networks , 1989 .

[26]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[27]  Suya You,et al.  Locating facial features using threshold images , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).

[28]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[29]  Roger M. Whitaker,et al.  An agent based approach to site selection for wireless networks , 2002, SAC '02.

[30]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[31]  Gerd Wagner,et al.  AAAI 2000 Workshop Reports , 2001, AI Mag..

[32]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[33]  Vasile Palade,et al.  Class Imbalance Learning Methods for Support Vector Machines , 2013 .

[34]  Kai Ming Ting,et al.  An Empirical Study of MetaCost Using Boosting Algorithms , 2000, ECML.

[35]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[36]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.