Automated and Efficient Sparsity-based Feature Selection via a Dual-component Vector

Recently, sparsity-based feature selection has gained much attention due to its good performance and potentially high interpretability. Most existing sparsity-based approaches rank original features based on their coefficients. The ranking mechanism requires a pre-defined number of selected features which is usually unknown. It also introduces a risk of selecting top-ranked but redundant features. In this paper, we address the above issues by proposing a dual-component vector in which one component represents coefficients of features, and the other component determines the decisions (selected/discarded) of features. While the former exposes relevant features with high coefficients, the latter automatically defines the number of selected features and avoids selecting redundant features. The dual-component vector is optimised by a population-based optimisation approach utilising simple vector operations (addition/subtraction) which are much more efficient than complex matrix operations (multiplication/inverse) used by existing sparsity-based approaches. Extensive experiments conducted on synthetic and real-world datasets demonstrate the superiority of the proposed algorithm in terms of effectiveness and efficiency compared with well-known and state-of-the-art feature selection approaches.