论文信息 - Automatic Classifying of Mac OS X Samples

Automatic Classifying of Mac OS X Samples

Thanks to the rapidly increasing volume of malware, the security industry has been struggling to improve automatic malware classification for many years. Many recent market research reports have suggested that the growth of Apple’s Mac OS X has outpaced PC platforms for several years. This shifting trend is attracting more malware authors to develop malware for Mac OS X. In this paper, we present a study of classifying Mac OS X malware with a set of features extracted from Mach-O metadata and its derivatives in a sample collection from VirusTotal. Like the PE format for Windows, the Mach-O format provides a variety of features for classification. We collected more than 300,000 Mach-O samples submitted to VirusTotal during 2015–16, and filtered out irrelevant samples, such as samples for iOS and PowerPC. We then generated metadata from the Mach-O files using tools like nm, otool and strings. Meta information from sample files, such as segment and section structures, imported functions of dynamic libraries, printable strings, etc., were used as features for classifying Mac OS X samples. Additionally, we included derivative numerical features created from meta information, which have been introduced into learning-based malware classification widely in recent research studies, e.g. function call distribution, structure complexity, etc. This study summarizes the statistical change in view of Mac OS X malware families, and the structure trending between benign and malicious samples between 2015 and 2016. With our collection of more than 300,000 files and over 4,000 malicious samples, our feature evaluation is based on composition analysis of different malware families in both aspects of meta and derivative features. This work uses a variety of classification algorithms to generate predictive models with the 2015 dataset, and to analyse the test results with the 2016 samples and their difference from AV vendors’ detections on VirusTotal. We also discuss the effectiveness of selected features, by ranking their importance levels in a predictive model among our classification tests with the 2015–16 dataset.

Haoping Liu | Spencer Hsieh

[1] Muhammad Zubair Shafiq,et al. PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[2] Karthik Raman,et al. Selecting Features to Classify Malware , 2012 .

[3] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[4] Muddassar Farooq,et al. ELF-Miner: using structural knowledge and data mining methods to detect new (Linux) malicious executables , 2011, Knowledge and Information Systems.