Challenges of Feature Selection for Big Data Analytics

We're surrounded by huge amounts of large-scale high-dimensional data, but learning tasks require reduced data dimensionality. Feature selection has shown its effectiveness in many applications by building simpler and more comprehensive models, improving learning performance, and preparing clean, understandable data. Some unique characteristics of big data such as data velocity and data variety have presented challenges to the feature selection problem. In this article, the authors envision these challenges for big data analytics. To facilitate and promote feature selection research, they present an open source feature selection repository (scikit-feature) of popular algorithms.

[1]  Huan Liu,et al.  Toward Time-Evolving Feature Selection on Dynamic Networks , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[2]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[3]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[4]  Darryl Stewart,et al.  Subband correlation and robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Jeremy Kubica,et al.  Parallel Large Scale Feature Selection for Logistic Regression , 2009, SDM.

[6]  Huan Liu,et al.  Unsupervised feature selection for linked social media data , 2012, KDD.

[7]  Huan Liu,et al.  Spectral Feature Selection for Data Mining , 2011 .

[8]  Huan Liu,et al.  Robust Unsupervised Feature Selection on Networked Data , 2016, SDM.

[9]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[10]  Liu Huan,et al.  Toward Time-Evolving Feature Selection on Dynamic Networks , 2016 .

[11]  Hiroshi Motoda,et al.  Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.

[12]  Kewei Cheng,et al.  FeatureMiner: A Tool for Interactive Feature Selection , 2016, CIKM.

[13]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[14]  Feiping Nie,et al.  Multi-View Clustering and Feature Learning via Structured Sparsity , 2013, ICML.

[15]  Jieping Ye,et al.  Sparse methods for biomedical data , 2012, SKDD.

[16]  Hao Huang,et al.  Unsupervised Feature Selection on Data Streams , 2015, CIKM.

[17]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[18]  Huan Liu,et al.  Unsupervised Streaming Feature Selection in Social Media , 2015, CIKM.

[19]  Huan Liu,et al.  Unsupervised Feature Selection for Multi-View Data in Social Media , 2013, SDM.