A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

In this paper, a novel inverse random under sampling (IRUS) method is proposed for class imbalance problem. The main idea is to severely under sample the negative class (majority class), thus creating a large number of distinct negative training sets. For each training set we then find a linear discriminant which separates the positive class from the negative class. By combining the multiple designs through voting, we construct a composite between the positive class and the negative class. The proposed methodology is applied on 11 UCI data sets and experimental results indicate a significant increase in Area Under Curve (AUC) when compared with many existing class-imbalance learning methods.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[5]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[6]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[7]  Xiaohua Hu,et al.  Wavelet transformation and cluster ensemble for gene expression analysis , 2005, Int. J. Bioinform. Res. Appl..

[8]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[9]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[10]  Daniela Zaharie,et al.  Dealing with noise in ant-based clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[11]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[14]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Marco Dorigo,et al.  Strategies for the Increased Robustness of Ant-Based Clustering , 2003, Engineering Self-Organising Systems.

[16]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[17]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[18]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[20]  Lawrence O. Hall,et al.  Swarm Based Fuzzy Clustering with Partition Validity , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[21]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[22]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[23]  Nitesh V. Chawla,et al.  C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure , 2003 .

[24]  Panayiotis E. Pintelas,et al.  Mixture of Expert Agents for Handling Imbalanced Data Sets , 2003 .

[25]  Nicolas Monmarché,et al.  A new clustering algorithm based on the chemical recognition system of ants , 2002 .

[26]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[27]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[28]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[29]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[30]  Takio Kurita,et al.  An Efficient Agglomerative Clustering Algorithm for Region Growing , 1994, MVA.

[31]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[32]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[33]  Lawrence O. Hall,et al.  Ant Clustering Using Ensembles of Partitions , 2009, MCS.

[34]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[35]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[36]  José Alfredo Ferreira Costa,et al.  An Empirical Analysis of Under-Sampling Techniques to Balance a Protein Structural Class Dataset , 2006, ICONIP.

[37]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[38]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.