Online Feature Selection for Streaming Features with High Redundancy Using Sliding-Window Sampling

In recent years, online feature selection has received much attention in data mining with the aim to reduce dimensionality of streaming features by removing irrelevant and redundant features in a real time manner. The existing works, such as Alpha-investing, OSFS, and SAOLA have been proposed to serve this purpose but have drawbacks e.g. low predication accuracy, and more numbers of selected features, streaming features can overflow when the streaming features they have high relevance to each other. In this paper, we propose an online learning algorithm, named OSFSW, with a sliding-window strategy to real-time sample streaming features, by the analysis of conditional independence to discard irrelevant and redundant features with the aim to overcome such drawbacks. Through OSFSW, we can get an approximate Markov blanket in a smaller number of selected features with high prediction accuracy. To validate the efficiency, we implement the proposed algorithm and test its performance on a prevalent dataset, i.e., NIPS 2003, and Causality Workbench. Through extensive experimental results, we demonstrate that OSFSW has a significant performance improvement on prediction accuracy and smaller numbers of selected features when comparing to Alpha-investing, OSFS and SAOLA.