Evolutionary extreme learning machine with sparse cost matrix for imbalanced learning.

Extreme learning machine is a popular machine learning technique for single hidden layer feed-forward neural network. However, due to the assumption of equal misclassification cost, the conventional extreme learning machine fails to properly learn the characteristics of the data with skewed category distribution. In this paper, to enhance the representation of few-shot cases, we break down that assumption by assigning penalty factors to different classes, and minimizing the cumulative classification cost. To this end, a case-weighting extreme learning machine is developed on a sparse cost matrix with a diagonal form. To be more actionable, we formulate a multi-objective optimization with respect to penalty factors, and optimize this problem using an evolutionary algorithm combined with an error bound model. By doing so, this proposed method is developed into an adaptive cost-sensitive learning, which is guided by the relation between the generalization ability and the case-weighting factors. In a broad experimental study, our method achieves competitive results on benchmark and real-world datasets for software bug reports identification.

[1]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[2]  Lei Huang,et al.  Evolutionary Model Selection and Parameter Estimation for Protein-Protein Interaction Network Based on Differential Evolution Algorithm , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Meng Luo,et al.  Compound feature selection and parameter optimization of ELM for fault diagnosis of rolling element bearings. , 2016, ISA transactions.

[4]  Lior Rokach,et al.  Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem , 2017, Neurocomputing.

[5]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[6]  Bin Gu,et al.  Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Amaury Lendasse,et al.  High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications , 2015, IEEE Access.

[8]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Li Zhao,et al.  Seemingly unrelated extreme learning machine , 2019, Neurocomputing.

[11]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[12]  Vincent Lemaire,et al.  Optimised probabilistic active learning (OPAL) , 2015, Machine Learning.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Ligang Liu,et al.  Projective Feature Learning for 3D Shapes with Multi‐View Depth Images , 2015, Comput. Graph. Forum.

[15]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[16]  Javier Pérez-Rodríguez,et al.  Class imbalance methods for translation initiation site recognition in DNA sequences , 2012, Knowl. Based Syst..

[17]  Antônio de Pádua Braga,et al.  Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Jun-Hai Zhai,et al.  The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers , 2015, International Journal of Machine Learning and Cybernetics.

[19]  Mohamed Benbouzid,et al.  An imbalance fault detection method based on data normalization and EMD for marine current turbines. , 2017, ISA transactions.

[20]  Patrick P. K. Chan,et al.  Radial Basis Function network learning using localized generalization error bound , 2009, Inf. Sci..

[21]  Zhongzhi Shi,et al.  Unsupervised extreme learning machine with representational features , 2015, International Journal of Machine Learning and Cybernetics.

[22]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[23]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[24]  Wei Qiao,et al.  Imbalance Fault Detection of Direct-Drive Wind Turbines Using Generator Current Signals , 2012 .

[25]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[26]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[27]  Xi Yang,et al.  Rich Feature Combination for Cost-Based Broad Learning System , 2019, IEEE Access.

[28]  Chee Kheong Siew,et al.  Incremental extreme learning machine with fully complex hidden nodes , 2008, Neurocomputing.

[29]  Xizhao Wang,et al.  Voting-based instance selection from large data sets with MapReduce and random weight networks , 2016, Inf. Sci..

[30]  Phyo Phyo San,et al.  Non-invasive hypoglycemia monitoring system using extreme learning machine for Type 1 diabetes. , 2016, ISA transactions.

[31]  Naif Alajlan,et al.  Differential Evolution Extreme Learning Machine for the Classification of Hyperspectral Images , 2014, IEEE Geoscience and Remote Sensing Letters.

[32]  Chi-Hyuck Jun,et al.  Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification , 2017, Inf. Sci..

[33]  Patrick P. K. Chan,et al.  An improved differential evolution and its application to determining feature weights in similarity based clustering , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[34]  D. Lowther,et al.  Differential Evolution Strategy for Constrained Global Optimization and Application to Practical Engineering Problems , 2006, IEEE Transactions on Magnetics.

[35]  Rong Chen,et al.  Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution , 2019, IEEE Transactions on Fuzzy Systems.

[36]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[37]  Petros Xanthopoulos,et al.  A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets , 2016, Expert Syst. Appl..

[38]  Shiwen Yang,et al.  Design of high-power Millimeter-wave TM/sub 01/-TE/sub 11/Mode converters by the differential evolution algorithm , 2005 .

[39]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Jun-Hai Zhai,et al.  Ensemble dropout extreme learning machine via fuzzy integral for data classification , 2018, Neurocomputing.

[41]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Jianhui Wang,et al.  Deep Network Based on Stacked Orthogonal Convex Incremental ELM Autoencoders , 2016 .

[43]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[44]  Francisco Herrera,et al.  Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data , 2015, Fuzzy Sets Syst..

[45]  Jun-Hai Zhai,et al.  Fuzzy integral-based ELM ensemble for imbalanced big data classification , 2018, Soft Comput..

[46]  Chi-Man Vong,et al.  Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[47]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[48]  Amir Seifi,et al.  Improving power system damping using a combination of optimal control theory and differential evolution algorithm. , 2019, ISA transactions.

[49]  William A. Rivera Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets , 2017, Inf. Sci..