Chatter: Classifying malware families using system event ordering

Using runtime execution artifacts to identify malware and its associated “family” is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, use of these fine-granularity data points makes these techniquse computationally expensive. Moreover, the signatures and heuristics this analysis produces are often circumvented by subsequent malware authors. To this end we propose CHATTER, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form only network events are considered and three malware families are highlighted. We show the technique achieves roughly 80% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of non-ordered features (with an accuracy of roughly 95%).

[1]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[2]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[3]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[4]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[5]  Amr M. Youssef,et al.  On the analysis of the Zeus botnet crimeware toolkit , 2010, 2010 Eighth International Conference on Privacy, Security and Trust.

[6]  Insup Lee,et al.  STiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata , 2010, Int. Sym. Wikis.

[7]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[8]  Insup Lee,et al.  AS-CRED: Reputation Service for Trustworthy Inter-Domain Routing , 2010 .

[9]  W. Timothy Strayer,et al.  Botnet Detection Based on Network Behavior , 2008, Botnet Detection.

[10]  Md. Rafiqul Islam,et al.  An automated classification system based on the strings of trojan and virus families , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[11]  Aziz Mohaisen,et al.  ADAM: Automated detection and attribution of malicious webpages , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[12]  Jingjing Yao,et al.  Malicious Executables Classification Based on Behavioral Factor Analysis , 2010, 2010 International Conference on e-Education, e-Business, e-Management and e-Learning.

[13]  Andrew G. West,et al.  Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language , 2015, Journal of medical Internet research.

[14]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[15]  Insup Lee,et al.  Spamming for Science: Active Measurement in Web 2.0 Abuse Research , 2012, Financial Cryptography Workshops.

[16]  T. Norberg Multilingual Vandalism Detection Using Language-independent & Ex Post Facto Evidence Recommended Citation Multilingual Vandalism Detection Using Language-independent & Ex Post Facto Evidence Multilingual Vandalism Detection Using Language-independent & Ex Post Facto Evidence Notebook for Pan at Clef , 2002 .

[17]  Thorsten Holz,et al.  As the net churns: Fast-flux botnet observations , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[18]  Herbert Bos,et al.  Prudent Practices for Designing Malware Experiments: Status Quo and Outlook , 2012, 2012 IEEE Symposium on Security and Privacy.

[19]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[20]  Insup Lee,et al.  Spatio-temporal analysis of Wikipedia metadata and the STiki anti-vandalism tool , 2010, Int. Sym. Wikis.

[21]  Insup Lee,et al.  Open Wikis and the protection of institutional welfare , 2012 .

[22]  Insup Lee,et al.  Analyzing and defending against web-based malware , 2013, CSUR.

[23]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[24]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[25]  Christopher Krügel,et al.  JACKSTRAWS: Picking Command and Control Connections from Bot Traffic , 2011, USENIX Security Symposium.

[26]  Felix C. Freiling,et al.  Using memory management to detect and extract illegitimate code for malware analysis , 2012, ACSAC '12.

[27]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[28]  Aziz Mohaisen,et al.  Metadata-Driven Threat Classification of Network Endpoints Appearing in Malware , 2014, DIMVA.

[29]  Insup Lee,et al.  Autonomous link spam detection in purely collaborative environments , 2011, Int. Sym. Wikis.

[30]  Aziz Mohaisen,et al.  AMAL: High-fidelity, behavior-based automated malware analysis and classification , 2014, Comput. Secur..

[31]  Konrad Rieck,et al.  A close look on n-grams in intrusion detection: anomaly detection vs. classification , 2013, AISec.

[32]  Aziz Mohaisen,et al.  Unveiling Zeus: automated classification of malware samples , 2013, WWW.

[33]  Marco Ramilli,et al.  Multi-stage delivery of malware , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[34]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[35]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[36]  Aziz Mohaisen,et al.  Babble: Identifying malware by its dialects , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[37]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[38]  Adam J. Aviv,et al.  CleanURL: A Privacy Aware Link Shortener , 2012 .

[39]  Vinod Yegneswaran,et al.  BLADE: an attack-agnostic approach for preventing drive-by malware infections , 2010, CCS '10.

[40]  Insup Lee,et al.  AS-TRUST: A Trust Quantification Scheme for Autonomous Systems in BGP , 2011, TRUST.

[41]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[42]  Andrew G. West,et al.  Damage Detection and Mitigation in Open Collaboration Applications , 2013 .

[43]  Adam J. Aviv,et al.  Measuring Privacy Disclosures in URL Query Strings , 2014, IEEE Internet Computing.

[44]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[45]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[46]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[47]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[48]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[49]  Insup Lee,et al.  ToMaTo: a trustworthy code mashup development tool , 2011, Mashups '11.

[50]  Insup Lee,et al.  Towards content-driven reputation for collaborative code repositories , 2012, WikiSym '12.

[51]  Insup Lee,et al.  What Wikipedia deletes: characterizing dangerous collaborative content , 2011, Int. Sym. Wikis.

[52]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[53]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[54]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[55]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[56]  Insup Lee,et al.  AS-TRUST: A Trust Characterization Scheme for Autonomous Systems in BGP , 2010 .

[57]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[58]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[59]  Guanhua Yan,et al.  Exploring Discriminatory Features for Automated Malware Classification , 2013, DIMVA.

[60]  Zheng Yan,et al.  Trust Modeling and Management in Digital Environments: From Social Concept to System Development , 2010 .

[61]  Stephen McCamant,et al.  HI-CFG: Construction by Binary Analysis and Application to Attack Polymorphism , 2013, ESORICS.

[62]  Aziz Mohaisen,et al.  Towards a Methodical Evaluation of Antivirus Scans and Labels - "If You're Not Confused, You're Not Paying Attention" , 2013, WISA.

[63]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[64]  Insup Lee,et al.  Spam mitigation using spatio-temporal reputations from blacklist history , 2010, ACSAC '10.

[65]  Insup Lee,et al.  Mitigating Spam Using Spatio-Temporal Reputation , 2010 .

[66]  Felix C. Freiling,et al.  TrumanBox: Improving Dynamic Malware Analysis by Emulating the Internet , 2011, SSS.

[67]  Andrew G. West Calculating and Presenting Trust in Collaborative Content , 2010 .

[68]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[69]  Insup Lee,et al.  QuanTM: a quantitative trust management system , 2009, EUROSEC '09.

[70]  Wenke Lee,et al.  K-Tracer: A System for Extracting Kernel Malware Behavior , 2009, NDSS.

[71]  Insup Lee,et al.  Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata? , 2010, EUROSEC '10.

[72]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[73]  Insup Lee,et al.  Towards the effective temporal association mining of spam blacklists , 2011, CEAS '11.