A New Feature Selection Method Based on a Self-Variant Genetic Algorithm Applied to Android Malware Detection

In solving classification problems in the field of machine learning and pattern recognition, the pre-processing of data is particularly important. The processing of high-dimensional feature datasets increases the time and space complexity of computer processing and reduces the accuracy of classification models. Hence, the proposal of a good feature selection method is essential. This paper presents a new algorithm for solving feature selection, retaining the selection and mutation operators from traditional genetic algorithms. On the one hand, the global search capability of the algorithm is ensured by changing the population size, on the other hand, finding the optimal mutation probability for solving the feature selection problem based on different population sizes. During the iteration of the algorithm, the population size does not change, no matter how many transformations are made, and is the same as the initialized population size; this spatial invariance is physically defined as symmetry. The proposed method is compared with other algorithms and validated on different datasets. The experimental results show good performance of the algorithm, in addition to which we apply the algorithm to a practical Android software classification problem and the results also show the superiority of the algorithm.

[1]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[2]  Win Zaw,et al.  Permission-Based Android Malware Detection , 2013 .

[3]  J. Canto,et al.  A simple algorithm for optimization and model fitting: AGA (asexual genetic algorithm) , 2009, 0905.3712.

[4]  Papia Ray,et al.  Various dimension reduction techniques for high dimensional data analysis: a review , 2021, Artificial Intelligence Review.

[5]  Nor Badrul Anuar,et al.  Bio-inspired for Features Optimization and Malware Detection , 2018 .

[6]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[7]  R. T. Goswami,et al.  A feature selection technique based on rough set and improvised PSO algorithm (PSORS-FS) for permission based detection of Android malwares , 2018, Int. J. Mach. Learn. Cybern..

[8]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  A. Simoes,et al.  Using genetic algorithms with sexual or asexual transposition: a comparative study , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[10]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[11]  Mehrdad Amirghasemi,et al.  An effective asexual genetic algorithm for solving the job shop scheduling problem , 2015, Comput. Ind. Eng..

[12]  Ninu Preetha Nirmala Sreedharan,et al.  Grey Wolf optimisation-based feature selection and classification for facial emotion recognition , 2018, IET Biom..

[13]  Andrew Lewis,et al.  The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..

[14]  Mengjie Zhang,et al.  A survey on swarm intelligence approaches to feature selection in data mining , 2020, Swarm Evol. Comput..

[15]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[16]  Jeng-Shyang Pan,et al.  Improved binary pigeon-inspired optimization and its application for feature selection , 2021, Applied Intelligence.

[17]  Jing-Wein Wang,et al.  Genetic Feature Selection for Texture Classification Using 2-D Non-separable Wavelet Bases(Special Section on Digital Signal Processing) , 1998 .

[18]  Anne M. P. Canuto,et al.  An exploratory analysis of data noisy scenarios in a Pareto-front based dynamic feature selection method , 2021, Appl. Soft Comput..

[19]  Taha Mansouri,et al.  ARO: A new model-free optimization algorithm inspired from asexual reproduction , 2010, Appl. Soft Comput..

[20]  Shunmugapriya Palanisamy Artificial Bee Colony Approach for Optimizing Feature Selection , 2012 .

[21]  Ebubekir Kaya,et al.  A Novel Neural Network Training Algorithm for the Identification of Nonlinear Static Systems: Artificial Bee Colony Algorithm Based on Effective Scout Bee Stage , 2021, Symmetry.

[22]  Bo Yang,et al.  A mobile malware detection method using behavior features in network traffic , 2019, J. Netw. Comput. Appl..

[23]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[24]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[25]  Aboul Ella Hassanien,et al.  Binary grey wolf optimization approaches for feature selection , 2016, Neurocomputing.

[26]  Fang Juan,et al.  Android malware detection based on permissions , 2014 .

[27]  Zhiwei Ye,et al.  A feature selection method based on modified binary coded ant colony optimization algorithm , 2016, Appl. Soft Comput..

[28]  Pei Hu,et al.  Improved Binary Grey Wolf Optimizer and Its application for feature selection , 2020, Knowl. Based Syst..

[29]  Siddhartha Bhattacharyya,et al.  S-shaped Binary Whale Optimization Algorithm for Feature Selection , 2019 .

[30]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[31]  Andrew Lewis,et al.  Grey Wolf Optimizer , 2014, Adv. Eng. Softw..

[32]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.