Online group streaming feature selection considering feature interaction

Abstract In real-world applications, features can be generated continuously one by one or by groups, such as image analysis and physical examination. Online streaming feature selection deals with streaming features on the fly. Existing streaming feature selection methods focus on removing irrelevant and redundant features and selecting the most relevant features, but they ignore the interaction between features. Interacting features appear to be irrelevant or weakly relevant to the class individually. However, if they are combined, they may highly correlate with the class. Features within the same group are more likely to interact with each other. Therefore, in this paper, we focus on feature interaction within and between the streaming groups and propose an Online Group Streaming Feature Selection method that can select Features to Interact with each other, named OGSFS-FI. OGSFS-FI consists of two stages: online intra-group selection and online inter-group selection. For intra-group selection, we design a new pair selection strategy that can select features interacting with each other. For inter-group selection, we use the regularization and variable selection method elastic net, which encourages a grouping effect. Extensive experiments conducted on synthetic and real-world datasets demonstrate our new method’s efficiency and effectiveness.

[1]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Mohammad Masoud Javidi,et al.  Online streaming feature selection using rough sets , 2016, Int. J. Approx. Reason..

[3]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[4]  Parham Moradi,et al.  OSFSMI: Online stream feature selection method based on mutual information , 2017, Appl. Soft Comput..

[5]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[6]  Mohammad Masoud Javidi,et al.  Online streaming feature selection: a minimum redundancy, maximum significance approach , 2018, Pattern Analysis and Applications.

[7]  Jing Wang,et al.  A survey on online feature selection with streaming features , 2018, Frontiers of Computer Science.

[8]  Xindong Wu,et al.  Subkilometer crater discovery with boosting and transfer learning , 2011, TIST.

[9]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[10]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[11]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[12]  Xindong Wu,et al.  Group Feature Selection with Streaming Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Jing Wang,et al.  Online Feature Selection with Group Structure Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.

[15]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[16]  José M. Peña,et al.  Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control , 2008, EvoBIO.

[17]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yuhua Qian,et al.  Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems , 2020, Knowl. Based Syst..

[19]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[21]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  Xindong Wu,et al.  Online streaming feature selection using adapted Neighborhood Rough Set , 2019, Inf. Sci..

[24]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[25]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Yonghong Xie,et al.  Incremental feature selection for dynamic hybrid data using neighborhood rough set , 2020, Knowl. Based Syst..

[28]  Xindong Wu,et al.  LOFS: Library of Online Streaming Feature Selection , 2016, Knowl. Based Syst..

[29]  Jiucheng Xu,et al.  Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets , 2021, IEEE Transactions on Fuzzy Systems.

[30]  Xindong Wu,et al.  Online feature selection for high-dimensional class-imbalanced data , 2017, Knowl. Based Syst..

[31]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[32]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[33]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[34]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..