Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm

Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.

[1]  Ma,et al.  An Effective Over-sampling Method for Imbalanced Data Sets Classification , 2011 .

[2]  Lance Chun Che Fung,et al.  Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm , 2010, ICONIP.

[3]  Zhang Chun,et al.  A Survey of Selective Ensemble Learning Algorithms , 2011 .

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Zhang Jun An Overview of Principal Curves , 2003 .

[6]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[7]  Chun-Xia Zhang,et al.  A Survey of Selective Ensemble Learning Algorithms: A Survey of Selective Ensemble Learning Algorithms , 2011 .

[8]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[9]  Gao Ji A Classfication Method For Imbalance Data Set Based on Kernel SMOTE , 2009 .

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[12]  Chi-Man Vong,et al.  Predicting minority class for suspended particulate matters level by extreme learning machine , 2014, Neurocomputing.

[13]  김용수,et al.  Extreme Learning Machine 기반 퍼지 패턴 분류기 설계 , 2015 .

[14]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[15]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..