Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data

Feature selection (FS) is an important technique in machine learning and data mining, especially for large-scale high-dimensional data. Most existing studies have been restricted to batch learning, which is often inefficient and poorly scalable when handling big data in real world. As real data may arrive sequentially and continuously, batch learning has to retrain the model for the new coming data, which is very computationally intensive. Online feature selection (OFS) is a promising new paradigm that is more efficient and scalable than batch learning algorithms. However, existing online algorithms usually fall short in their inferior efficacy. In this article, we present a novel second-order OFS algorithm that is simple yet effective, very fast and extremely scalable to deal with large-scale ultra-high dimensional sparse data streams. The basic idea is to exploit the second-order information to choose the subset of important features with high confidence weights. Unlike existing OFS methods that often suffer from extra high computational cost, we devise a novel algorithm with a MaxHeap-based approach, which is not only more effective than the existing first-order algorithms, but also significantly more efficient and scalable. Our extensive experiments validated that the proposed technique achieves highly competitive accuracy as compared with state-of-the-art batch FS methods, meanwhile it consumes significantly less computational cost that is orders of magnitude lower. Impressively, on a billion-scale synthetic dataset (1-billion dimensions, 1-billion non-zero features, and 1-million samples), the proposed algorithm takes less than 3 minutes to run on a single PC.

[1]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[2]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[3]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[4]  Steven C. H. Hoi,et al.  Second Order Online Collaborative Filtering , 2013, ACML.

[5]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[6]  Chenxia Jin,et al.  Feature selection with partition differentiation entropy for large-scale data sets , 2016, Inf. Sci..

[7]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[8]  Steven C. H. Hoi,et al.  Soft Confidence-Weighted Learning , 2016, ACM Trans. Intell. Syst. Technol..

[9]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[10]  Tao Mei,et al.  Learning hierarchical video representation for action recognition , 2017, International Journal of Multimedia Information Retrieval.

[11]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[12]  Verónica Bolón-Canedo,et al.  Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data , 2017, Int. J. Intell. Syst..

[13]  Feng Jiang,et al.  A relative decision entropy-based feature selection approach , 2015, Pattern Recognit..

[14]  Michael R. Lyu,et al.  Efficient online learning for multitask feature selection , 2013, TKDD.

[15]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Koby Crammer,et al.  Exploiting Feature Covariance in High-Dimensional Online Learning , 2010, AISTATS.

[17]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Hao Huang,et al.  Unsupervised Feature Selection on Data Streams , 2015, CIKM.

[20]  James Theiler,et al.  Online feature selection for pixel classification , 2005, ICML.

[21]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[22]  Panos M. Pardalos,et al.  Sparse Proximal Support Vector Machines for feature selection in high dimensional datasets , 2015, Expert Syst. Appl..

[23]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[24]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yi Ding,et al.  Adaptive Subgradient Methods for Online AUC Maximization , 2016, ArXiv.

[26]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[27]  Le Thi Hoai An,et al.  Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms , 2014, Neural Networks.

[28]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[29]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[30]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[31]  Dan Roth,et al.  Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks , 2015, KDD.

[32]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[35]  Chunyan Miao,et al.  High-Dimensional Data Stream Classification via Sparse Online Learning , 2014, 2014 IEEE International Conference on Data Mining.

[36]  Hao Wang,et al.  Online Streaming Feature Selection , 2010, ICML.

[37]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[38]  Zenglin Xu,et al.  Budget constrained non-monotonic feature selection , 2015, Neural Networks.

[39]  Sheng-yi Jiang,et al.  Efficient feature selection based on correlation measure between continuous and discrete features , 2016, Inf. Process. Lett..

[40]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[41]  Ji Wan,et al.  SOML: Sparse Online Metric Learning with Application to Image Retrieval , 2014, AAAI.

[42]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[43]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[44]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[45]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[46]  Huan Liu,et al.  Feature Selection with Linked Data in Social Media , 2012, SDM.

[47]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[48]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[50]  Yi Ding,et al.  An Adaptive Gradient Method for Online AUC Maximization , 2015, AAAI.

[51]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[52]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[53]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[54]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[55]  Yong Wang,et al.  Combining global, regional and contextual features for automatic image annotation , 2009, Pattern Recognit..

[56]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[57]  Jiawei Han,et al.  Text Classification with Heterogeneous Information Network Kernels , 2016, AAAI.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[60]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[61]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..