Online Feature Selection for Streaming Features Using Self-Adaption Sliding-Window Sampling

In recent years, online feature selection has been a research topic on streaming feature mining, as it can reduce the dimensionality of the streaming features by removing the irrelevant and redundant features in real time. There are many representative research efforts on the online feature selection with streaming features, i.e., <italic>alpha − investing</italic>, online streaming feature selection (<italic>OSFS</italic>), and scalable and accurate online approach (<italic>SAOLA</italic>) for feature selection. In these studies, alpha-investing has limited prediction accuracy and a large number of selected features. <italic>SAOLA</italic> sometimes offers outstanding efficiency in running time and prediction accuracy but possesses a large number of selected features. <italic>OSFS</italic> offers high prediction accuracy in many datasets, but its running time increases exponentially with an increasing number of features with low redundancy and high relevance. To address the limitations of the above-mentioned works, we propose an online learning algorithm named <italic>OSFAS</italic>, which samples streaming features in real-time by a self-adaption sliding-window and discards the irrelevant and redundant features by conditional independence. The <italic>OSFAS</italic> obtains an approximate Markov blanket with high prediction accuracy, meanwhile reducing the number of selected features. The efficiency of the proposed OSFASW algorithm was validated in a performance test on widely used datasets, e.g., <italic>NIPS</italic>2003 and <italic>causality workbench</italic>. Through the extensive experimental results, we demonstrate that <italic>OSFAS</italic> significantly improves the prediction accuracy and requires a smaller number of selected features than <italic>alpha − investing</italic>, <italic>OSFS</italic>, and <italic>SAOLA</italic>.

[1]  Jian Pei,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[2]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Hao Wang,et al.  Causal Discovery from Streaming Features , 2010, 2010 IEEE International Conference on Data Mining.

[5]  Hai-Tao Zheng,et al.  Online Streaming Feature Selection Using Sampling Technique and Correlations Between Features , 2016, APWeb.

[6]  Hao Wang,et al.  Markov Blanket Feature Selection Using Representative Sets , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Hao Wang,et al.  Classification with Streaming Features: An Emerging-Pattern Mining Approach , 2015, TKDD.

[8]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Prakash P. Shenoy,et al.  An adaptive heuristic for feature selection based on complementarity , 2018, Machine Learning.

[10]  Jing Wang,et al.  A survey on online feature selection with streaming features , 2018, Frontiers of Computer Science.

[11]  Guanglu Sun,et al.  Feature selection for IoT based on maximal information coefficient , 2018, Future Gener. Comput. Syst..

[12]  Xiang Zhang,et al.  Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[13]  Jing Wang,et al.  Online Feature Selection with Group Structure Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[15]  Anders L. Madsen,et al.  A parallel algorithm for Bayesian network structure learning from large data sets , 2017, Knowl. Based Syst..

[16]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[17]  Jing Zhou,et al.  Streaming feature selection using alpha-investing , 2005, KDD '05.

[18]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Naixue Xiong,et al.  Oriented Feature Selection SVM Applied to Cancer Prediction in Precision Medicine , 2018, IEEE Access.

[20]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[22]  Yingfeng Cai,et al.  Short-Time Traffic State Forecasting Using Adaptive Neighborhood Selection Based on Expansion Strategy , 2018, IEEE Access.

[23]  Qinghua Hu,et al.  Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information , 2017, IEEE Transactions on Fuzzy Systems.

[24]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[25]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[26]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[29]  Rong Chen,et al.  Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification , 2018, IEEE Access.