Iterative Metric Learning for Imbalance Data Classification

In many classification applications, the amount of data from different categories usually vary significantly, such as software defect predication and medical diagnosis. Under such circumstances, it is essential to propose a proper method to solve the imbalance issue among the data. However, most of the existing methods mainly focus on improving the performance of classifiers rather than searching for an appropriate way to find an effective data space for classification. In this paper, we propose a method named Iterative Metric Learning (IML) to explore the correlations among the imbalance data and construct an effective data space for classification. Given the imbalance training data, it is important to select a subset of training samples for each testing data. Thus, we aim to find a more stable neighborhood for the testing data using the iterative metric learning strategy. To evaluate the effectiveness of the proposed method, we have conducted experiments on two groups of dataset, i.e., the NASA Metrics Data Program (NASA) dataset and UCI Machine Learning Repository (UCI) dataset. Experimental results and comparisons with state-of-the-art methods have exhibited better performance of our proposed method.

[1]  Mohamed Bahy Bader-El-Den,et al.  Hierarchical classification for dealing with the Class imbalance problem , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[2]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Ralph A. Evans,et al.  IEEE transactions on reliability , 2004, IEEE Transactions on Reliability.

[4]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[5]  Shiguang Shan,et al.  Multiset Feature Learning for Highly Imbalanced Data Classification , 2017, AAAI.

[6]  Rongrong Ji,et al.  Low-Rank Similarity Metric Learning in High Dimensions , 2015, AAAI.

[7]  Stan Matwin,et al.  Learning from Imbalanced Data Using Ensemble Methods and Cluster-Based Undersampling , 2014, NFMCP.

[8]  Cigdem Inan Aci,et al.  A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm , 2010, Expert Syst. Appl..

[9]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[10]  L. Carvajal,et al.  IEEE Transactions on Software Engineering , 2016 .

[11]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[12]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[13]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[14]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.

[15]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[16]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[17]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[18]  Divya Tomar,et al.  Prediction of Defective Software Modules Using Class Imbalance Learning , 2016, Appl. Comput. Intell. Soft Comput..

[19]  Automated Software Engineering , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Ajalmar R. da Rocha Neto,et al.  Classification with reject option for software defect prediction , 2016, Appl. Soft Comput..

[21]  Ping Guo,et al.  Software Defect Prediction Using Fuzzy Support Vector Regression , 2010, ISNN.

[22]  Nitin Bhatia,et al.  A bayesian network based approach for software defects prediction , 2011, SOEN.