Cost-sensitive feature selection via the ℓ2, 1-norm

Abstract An essential step in data mining and machine learning is selecting a useful feature subset from the high-dimensional feature space. Many existing feature selection algorithms only consider precision, but do not consider error types and test cost. In this paper, we use the l 2 , 1 -norm to propose a cost-sensitive embedded feature selection algorithm that minimizes the total cost rather than maximizing accuracy. The algorithm is a cost-sensitive feature selection algorithm with joint l 2 , 1 -norm minimization of the loss function with misclassification costs. The l 2 , 1 -norm based loss function with misclassification costs is robust to outliers. We also add an orthogonal constraint term to guarantee that each selected feature is independent. The proposed algorithm simultaneously takes into account both test costs and misclassification costs. Finally, an iterative updating algorithm is provided using the objective function that makes cost-sensitive feature selection more efficient. The cost-sensitive feature selection algorithm is more realistic than existing feature selection algorithms. Extensive experimental results on publicly available datasets demonstrate that the proposed algorithm is effective, can select a low-cost subset and achieve better performance than other feature selection algorithms in real-world applications.

[1]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Fan Min,et al.  Tri-partition cost-sensitive active learning through kNN , 2017, Soft Computing.

[3]  Qinghua Hu,et al.  Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence , 2016, Inf. Sci..

[4]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[5]  Xiuyi Jia,et al.  A Decision-Theoretic Rough Set Approach to Multi-class Cost-Sensitive Classification , 2016, IJCRS.

[6]  Jing Bian,et al.  An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem , 2016 .

[7]  Ming Yang,et al.  Discriminative cost sensitive Laplacian score for face recognition , 2015, Neurocomputing.

[8]  Hong Zhao,et al.  A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism , 2017, Inf. Sci..

[9]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[10]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[11]  Yong Luo,et al.  Cost-Sensitive Feature Selection via F-Measure Optimization Reduction , 2017, AAAI.

[12]  Witold Pedrycz,et al.  Large-Scale Multimodality Attribute Reduction With Multi-Kernel Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[13]  William Zhu,et al.  Multi-label feature selection via feature manifold learning and sparsity regularization , 2018, Int. J. Mach. Learn. Cybern..

[14]  Feiping Nie,et al.  Multi-Class L2,1-Norm Support Vector Machine , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  William Zhu,et al.  Relationship among basic concepts in covering-based rough sets , 2009, Inf. Sci..

[17]  Daoqiang Zhang,et al.  Cost-sensitive feature selection with application in software defect prediction , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[18]  Tao Li,et al.  Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features , 2016, Knowl. Based Syst..

[19]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Qinghua Hu,et al.  Subspace clustering guided unsupervised feature selection , 2017, Pattern Recognit..

[22]  Yong Luo,et al.  Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification , 2019, IEEE Transactions on Image Processing.

[23]  Bing Huang,et al.  Cost-sensitive sequential three-way decision modeling using a deep neural network , 2017, Int. J. Approx. Reason..

[24]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[25]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[26]  Yuhua Qian,et al.  A comparative study of multigranulation rough sets and concept lattices via rule acquisition , 2016, Knowl. Based Syst..

[27]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[28]  William Zhu,et al.  Sparse Graph Embedding Unsupervised Feature Selection , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[29]  Jianhua Dai,et al.  Uncertainty measurement for incomplete interval-valued information systems based on α-weak similarity , 2017, Knowl. Based Syst..

[30]  Dacheng Tao,et al.  Large-Margin Multi-Label Causal Feature Learning , 2015, AAAI.

[31]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[32]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[33]  Mohammad Masoud Javidi,et al.  Online streaming feature selection using rough sets , 2016, Int. J. Approx. Reason..

[34]  Hong Zhao,et al.  Optimal cost-sensitive granularization based on rough sets for variable costs , 2014, Knowl. Based Syst..

[35]  Wei-Zhi Wu,et al.  Maximal-Discernibility-Pair-Based Approach to Attribute Reduction in Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[36]  Jia Wu,et al.  CogBoost: Boosting for Fast Cost-Sensitive Graph Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[37]  Zhihui Lai,et al.  The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition , 2016, Pattern Recognit..

[38]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[39]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[40]  Junmo Kim,et al.  Unsupervised Simultaneous Orthogonal basis Clustering Feature Selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[42]  Xiaodong Yue,et al.  Tri-partition neighborhood covering reduction for robust classification , 2017, Int. J. Approx. Reason..

[43]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[44]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[45]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[46]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[47]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[48]  Usman Qamar,et al.  Feature selection using rough set-based direct dependency calculation by avoiding the positive region , 2018, Int. J. Approx. Reason..

[49]  Xuelong Li,et al.  New l1-Norm Relaxations and Optimizations for Graph Clustering , 2016, AAAI.

[50]  Zhenyu He,et al.  Joint sparse principal component analysis , 2017, Pattern Recognit..

[51]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.