OSFSMI: Online stream feature selection method based on mutual information

Abstract Feature selection is used to choose a subset of the most informative features in pattern identification based on machine learning methods. However, in many real-world applications such as online social networks, it is either impossible to acquire the entire feature set or to wait for the complete set of features before starting the feature selection process. To handle this issue, online streaming feature selection approaches have been recently proposed to provide a complementary algorithmic methodology by choosing the most informative features. Most of these methods suffer from challenges such as high computational cost, stability of the generated results and the size of the final features subset. In this paper, two novel feature selection methods called OSFSMI and OSFSMI-k are proposed to select the most informative features from online streaming features. The proposed methods employ mutual information concept in a streaming manner to evaluate correlation between features and also to assess the relevancy and redundancy of features in complex classification tasks. The proposed methods do not use any learning model in their search process, and thus can be classified as filter-based methods Several experiments are performed to compare the performance of the proposed algorithms with the state-of-the-art online streaming feature selection methods The reported results show that the proposed methods performs better than the others in most of the cases.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[3]  Hao Huang,et al.  Unsupervised Feature Selection on Data Streams , 2015, CIKM.

[4]  José M. Peña,et al.  Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control , 2008, EvoBIO.

[5]  Parham Moradi,et al.  Gene selection for microarray data classification using a novel ant colony optimization , 2015, Neurocomputing.

[6]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[7]  Jared Dean,et al.  Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners , 2014 .

[8]  Guoyin Wang,et al.  Incremental Attribute Reduction Based on Elementary Sets , 2005, RSFDGrC.

[9]  Edwin Lughofer,et al.  On-line incremental feature weighting in evolving fuzzy classifiers , 2011, Fuzzy Sets Syst..

[10]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[11]  Mahardhika Pratama,et al.  An Incremental Type-2 Meta-Cognitive Extreme Learning Machine , 2017, IEEE Transactions on Cybernetics.

[12]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[13]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[14]  Chee Peng Lim,et al.  An incremental meta-cognitive-based scaffolding fuzzy neural network , 2016, Neurocomputing.

[15]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments: Methods and Applications , 2012 .

[16]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[17]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[18]  Xindong Wu,et al.  Online Group Feature Selection from Feature Streams , 2013, AAAI.

[19]  Thy-Hou Lin,et al.  Implementing the Fisher's Discriminant Ratio in a k-Means Clustering Algorithm for Feature Selection and Data Set Trimming , 2004, Journal of Chemical Information and Modeling.

[20]  Uffe Kock Wiil,et al.  Weighted bee colony algorithm for discrete optimization problems with application to feature selection , 2015, Eng. Appl. Artif. Intell..

[21]  Jiye Liang,et al.  Attribute reduction: A dimension incremental strategy , 2013, Knowl. Based Syst..

[22]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[23]  Hao Wang,et al.  Online Streaming Feature Selection , 2010, ICML.

[24]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[25]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[26]  Parham Moradi,et al.  Relevance-redundancy feature selection based on ant colony optimization , 2015, Pattern Recognit..

[27]  Yuming Zhou,et al.  Selecting feature subset for high dimensional data via the propositional FOIL rules , 2013, Pattern Recognit..

[28]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[29]  Majid Nili Ahmadabadi,et al.  Online Local Input Selection Through Evolving Heterogeneous Fuzzy Inference System , 2016, IEEE Transactions on Fuzzy Systems.

[30]  Beizhan Wang,et al.  The Key Data Mining Models for High Dimensional Data , 2013 .

[31]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[32]  Parham Moradi,et al.  A graph theoretic approach for unsupervised feature selection , 2015, Eng. Appl. Artif. Intell..

[33]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[34]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[35]  Jing Zhou,et al.  Streaming feature selection using alpha-investing , 2005, KDD '05.

[36]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[37]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Brian J. d'Auriol,et al.  A novel feature selection method based on normalized mutual information , 2011, Applied Intelligence.

[39]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[40]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[41]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[42]  Xindong Wu,et al.  LOFS: Library of Online Streaming Feature Selection , 2016, Knowl. Based Syst..

[43]  Hiroshi Motoda,et al.  Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.

[44]  Mingjie Cai,et al.  Knowledge reduction of dynamic covering decision information systems when varying covering cardinalities , 2016, Inf. Sci..

[45]  Jing Zhou,et al.  Streaming Feature Selection using IIC , 2005, AISTATS.

[46]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[47]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[48]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[49]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[51]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[52]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[53]  Mohammad Masoud Javidi,et al.  Online streaming feature selection using rough sets , 2016, Int. J. Approx. Reason..

[54]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.