Online feature importance ranking based on sensitivity analysis

Abstract Online learning is a growing branch of data mining which allows all traditional data mining techniques to be applied on a online stream of data in real time. In this paper, we present a fast and efficient online sensitivity based feature ranking method (SFR) which is updated incrementally. We take advantage of the concept of global sensitivity and rank features based on their impact on the outcome of the classification model. In the feature selection part, we use a two-stage filtering method in order to first eliminate highly correlated and redundant features and then eliminate irrelevant features in the second stage. One important advantage of our algorithm is its generality, which means the method works for correlated feature spaces without preprocessing. It can be implemented along with any single-pass online classification method with separating hyperplane such as SVMs. The proposed method is primarily developed for online tasks, however, we achieve very significant experimental results in comparison with popular batch feature ranking/selection methods. We also perform experiments to compare the method with available online feature ranking methods. Empirical results suggest that our method can be successfully implemented in batch learning or online mode.

[1]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[5]  Jing Wang,et al.  Online Feature Selection with Group Structure Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Amparo Alonso-Betanzos,et al.  A unified pipeline for online feature selection and classification , 2016, Expert Syst. Appl..

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Moamar Sayed-Mouchaweh Learning from Data Streams in Dynamic Environments , 2015 .

[9]  Chandrika Kamath,et al.  Feature selection in scientific applications , 2004, KDD.

[10]  B. Iooss,et al.  A Review on Global Sensitivity Analysis Methods , 2014, 1404.2405.

[11]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[12]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[13]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[14]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[15]  Lei Tang,et al.  Large-scale behavioral targeting with a social twist , 2011, CIKM '11.

[16]  S. Dowdy,et al.  Statistics for Research , 1983 .

[17]  Svetha Venkatesh,et al.  Stable Feature Selection with Support Vector Machines , 2015, Australasian Conference on Artificial Intelligence.

[18]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[19]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[20]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[21]  Xuelong Li,et al.  Exploiting Local Coherent Patterns for Unsupervised Feature Ranking , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[23]  Hai-Tao Zheng,et al.  Online Feature Selection Based on Passive-Aggressive Algorithm with Retaining Features , 2015, APWeb.

[24]  Xin Yao,et al.  A learning framework for online class imbalance learning , 2013, 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL).

[25]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[27]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[28]  David G. Stork,et al.  Pattern Classification , 1973 .

[29]  Qi Yu,et al.  Biclustering and Feature Selection Techniques in Bioinformatics , 2010, ICDEM.

[30]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[32]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[34]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[35]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[36]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[37]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).