Broaden the minority class space for decision tree induction using antigen-derived detectors

This paper focuses on broadening the minority space by learning both majority class space and minority class space.A negative selection over-sampling technology (NSOTE) is proposed.Previous over-sampling methods only learn minority class space to produce minority class examples.We also investigate the performance of NSOTE and previous over-sampling methods on artificial and real datasets. To deal with lack of density over imbalanced datasets, a Negative Selection Over-Sampling Technology (NSOTE) is proposed. NSOTE is based on a negative selection mechanism of our human immune system. It generates antigen-derived detectors of majority class examples to enrich the decision regions of the space of minority class. Meanwhile, through learning the density distribution of minority class examples, NSOTE eliminates the noise detectors that deviate from the minority class space. Our experimental results show that our NSOTE can achieve better performance than existing resampling methods.

[1]  Bin Gu,et al.  Bi-Parameter Space Partition for Cost-Sensitive SVM , 2015, IJCAI.

[2]  Vasile Palade,et al.  Efficient resampling methods for training support vector machines with imbalanced datasets , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  Zhou Ji,et al.  Analysis of Dental Images using Artificial Immune Systems , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[4]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[5]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[6]  Stephanie Forrest,et al.  Architecture for an Artificial Immune System , 2000, Evolutionary Computation.

[7]  David A. Cieslak,et al.  Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Francisco Herrera,et al.  A preliminary study on overlapping and data fracture in imbalanced domains by means of Genetic Programming-based feature extraction , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[9]  Zhihua Xia,et al.  Steganalysis of least significant bit matching using multi-order differences , 2014, Secur. Commun. Networks.

[10]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[11]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[12]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[15]  Leandro Nunes de Castro,et al.  aiNet: An Artificial Immune Network for Data Analysis , 2002 .

[16]  Richard A. Berk Classification and Regression Trees (CART) , 2008 .

[17]  Xingming Sun,et al.  Structural Minimax Probability Machine , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[19]  D. Wong,et al.  Negative Selection Algorithm for Aircraft Fault Detection , 2004, ICARIS.

[20]  Xingming Sun,et al.  Effective and Efficient Image Copy Detection with Resistance to Arbitrary Rotation , 2016, IEICE Trans. Inf. Syst..

[21]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[22]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[24]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25]  Di Xiao,et al.  An efficient and noise resistive selective image encryption scheme for gray images based on chaotic maps and DNA complementary rules , 2014, Multimedia Tools and Applications.

[26]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[27]  Fabio A. González,et al.  Anomaly Detection Using Real-Valued Negative Selection , 2003, Genetic Programming and Evolvable Machines.

[28]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[29]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[30]  Naixue Xiong,et al.  Steganalysis of LSB matching using differences between nonadjacent pixels , 2016, Multimedia Tools and Applications.

[31]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[34]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[35]  María José del Jesús,et al.  On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets , 2010, Inf. Sci..