Obfuscated computer virus detection using machine learning algorithm

Nowadays, computer virus attacks are getting very advanced. New obfuscated computer virus created by computer virus writers will generate a new shape of computer virus automatically for every single iteration and download. This constantly evolving computer virus has caused significant threat to information security of computer users, organizations and even government. However, signature based detection technique which is used by the conventional anti-computer virus software in the market fails to identify it as signatures are unavailable. This research proposed an alternative approach to the traditional signature based detection method and investigated the use of machine learning technique for obfuscated computer virus detection. In this work, text strings are used and have been extracted from virus program codes as the features to generate a suitable classifier model that can correctly classify obfuscated virus files. Text string feature is used as it is informative and potentially only use small amount of memory space. Results show that unknown files can be correctly classified with 99.5% accuracy using SMO classifier model. Thus, it is believed that current computer virus defense can be strengthening through machine learning approach.

[1]  Sami Khuri,et al.  ANALYSIS AND DETECTION OF METAMORPHIC COMPUTER VIRUSES , 2006 .

[2]  Mark Stamp,et al.  Metamorphic worm that carries its own morphing engine , 2013, Journal of Computer Virology and Hacking Techniques.

[3]  Sulaiman Mohd Nor,et al.  FEATURE SELECTION AND MACHINE LEARNING CLASSIFICATION FOR MALWARE DETECTION , 2015 .

[4]  K. Ruba Soundar,et al.  ANALYSIS OF CLASSIFICATION ALGORITHMS ON DIFFERENT DATASETS , 2018 .

[5]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[6]  A. Baith Mohamed,et al.  Eigenviruses for metamorphic virus recognition , 2011, IET Inf. Secur..

[7]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[8]  Md. Rafiqul Islam,et al.  A Comparative Study of Malware Family Classification , 2012, ICICS.

[9]  Christopher Richardson Virus detection with machine learning , 2009 .

[10]  P. Kavitha,et al.  Malware Classification through HEX Conversion and Mining , 2012, CloudCom 2012.

[11]  Aman Jantan,et al.  Malware Behavior Analysis: Learning and Understanding Current Malware Threats , 2010, 2010 Second International Conference on Network Applications, Protocols and Services.

[12]  Niklas Lavesson,et al.  Accurate Adware Detection Using Opcode Sequence Extraction , 2011, 2011 Sixth International Conference on Availability, Reliability and Security.

[13]  Douglas S. Reeves,et al.  Polymorphic and metamorphic malware detection , 2008 .

[14]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.

[15]  Niklas Lavesson,et al.  Detecting scareware by mining variable length instruction sequences , 2011, 2011 Information Security for South Africa.

[16]  Sulaiman Mohd Nor,et al.  Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise , 2010, 2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems.

[17]  Aditya Govindaraju Exhaustive Statistical Analysis for Detection of Metamorphic Malware , 2010 .

[18]  Morgan C. Wang,et al.  Data mining methods for malware detection , 2008 .

[19]  Kateryna Chumachenko,et al.  Machine Learning Methods for Malware Detection and Classification , 2017 .