An Efficient Framework to Build Up Malware Dataset

This research paper presents a framework on how to build up malware dataset. Many researchers took longer time to clean the dataset from any noise or to transform the dataset into a format that can be used straight away for testing. Therefore, this research is proposing a framework to help researchers to speed up the malware dataset cleaning processes which later can be used for testing. It is believed, an efficient malware dataset cleaning processes, can improved the quality of the data, thus help to improve the accuracy and the efficiency of the subsequent analysis. Apart from that, an in-depth understanding of the malware taxonomy is also important prior and during the dataset cleaning processes. A new Trojan classification has been proposed to complement this framework. This experiment has been conducted in a controlled lab environment and using the dataset from Vx Heavens dataset. This framework is built based on the integration of static and dynamic analyses, incident response method and knowledge database discovery (KDD) processes. This framework can be used as the basis guideline for malware researchers in building malware dataset. Keywords—Dataset, knowledge database discovery (KDD), malware, static and dynamic analyses.

[1]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[2]  Jasni Mohamad Zain,et al.  The Design of Pre-Processing Multidimensional Data Based on Component Analysis , 2011, Comput. Inf. Sci..

[3]  Jeyavijayan Rajendran,et al.  Towards a comprehensive and systematic classification of hardware Trojans , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[4]  Robert Engels,et al.  Using a Data Metric for Preprocessing Advice for Data Mining Applications , 1998, ECAI.

[5]  Nathalie Japkowicz,et al.  A Feature Selection and Evaluation Scheme for Computer Virus Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Jim Plusquellic Abstract 1.0 Introduction 2.0 Taxonomy 2.1 Trojan Physical Characteristics Taxonomy of Trojans for Ic Trust , .

[7]  Farinaz Koushanfar,et al.  A Survey of Hardware Trojan Taxonomy and Detection , 2010, IEEE Design & Test of Computers.

[8]  Lior Rokach,et al.  Detection of unknown computer worms based on behavioral classification of the host , 2008, Comput. Stat. Data Anal..

[9]  Fauzan Mirza,et al.  Determining malicious executable distinguishing attributes and low-complexity detection , 2011, Journal in Computer Virology.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Madihah Mohd Saudi,et al.  An Efficient Trojan Horse Classification (ETC) , 2013 .

[12]  Mark Mohammad Tehranipoor,et al.  Hardware Trojan Detection and Isolation Using Current Integration and Localized Current Analysis , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[13]  Muhammad Zubair Shafiq,et al.  Embedded Malware Detection Using Markov n-Grams , 2008, DIMVA.

[14]  Peter L. Bartlett,et al.  Open problems in the security of learning , 2008, AISec '08.

[15]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[16]  Madihah Mohd Saudi,et al.  Efficient STAKCERT KDD Processes in Worm , 2011 .

[17]  Davide Balzarotti,et al.  Towards network containment in malware analysis systems , 2012, ACSAC '12.

[18]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[19]  Thomas Stibor A Study of Detecting Computer Viruses in Real-Infected Files in the n-Gram Representation with Machine Learning Methods , 2010, IEA/AIE.

[20]  Vinod Yegneswaran,et al.  A comparative assessment of malware classification using binary texture analysis and dynamic analysis , 2011, AISec '11.

[21]  Jianyong Dai,et al.  Efficient Virus Detection Using Dynamic Instruction Sequences , 2009, J. Comput..

[22]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[23]  Suhaimi Ibrahim,et al.  Evolution of Computer Virus Concealment and Anti-Virus Techniques: A Short Survey , 2011, ArXiv.