Truncated Gradient Confidence-Weighted Based Online Learning for Imbalance Streaming Data

Online learning for imbalanced streaming data is an important and challenging problem for many classification tasks in the machine learning research field. Traditional online learning algorithms are mainly focused on classification tasks with balanced data, and with little consideration about the characteristics of imbalanced streaming data. In this paper, we propose a novel online learning algorithm called Truncated Gradient Confidence-Weighted (TGCW), which integrate the truncated gradient algorithm with the confidence weighted algorithm together to improve the feature selection ability while reducing the dimensions of imbalanced streaming data effectively. We study a number of classification tasks with various imbalance data ratio including the pedestrian detection application and compare the performance of the TGCW algorithm with traditional online learning algorithms, and empirical results show that the TGCW algorithm can achieve better performance consistently than other baseline approaches.