Malicious Code Detection Using Penalized Splines on OPcode Frequency

Recently, malicious software are gaining exponential growth due to the innumerable obfuscations of extended x86 IA-32 (OPcodes) that are being employed to evade from traditional detection methods. In this paper, we design a novel distinguisher to separate malware from benign that combines Multivariate Logistic Regression model using kernel HS in Penalized Splines along with OPcode frequency feature selection technique for efficiently detecting obfuscated malware. The main advantage of our penalized splines based feature selection technique is its performance capability achieved through the efficient filtering and identification of the most important OPcodes used in the obfuscation of malware. This is demonstrated through our successful implementation and experimental results of our proposed model on large malware datasets. The presented approach is effective at identifying previously examined malware and non-malware to assist in reverse engineering.

[1]  Paul A. Watters,et al.  A methodology for analyzing the credential marketplace , 2011 .

[2]  Robert Layton,et al.  Malware Detection Based on Structural and Behavioural Features of API Calls , 2010 .

[3]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[4]  Jesse C. Rabek,et al.  Detection of injected, dynamically generated, and obfuscated malicious code , 2003, WORM '03.

[5]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.

[6]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[7]  Srinivas Mukkamala,et al.  Kernel machines for malware classification and similarity analysis , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[8]  George Lawton Virus Wars: Fewer Attacks, New Threats , 2002, Computer.

[9]  Wen Fu,et al.  Malware Detection Based on Suspicious Behavior Identification , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[10]  Eldad Eilam,et al.  Reversing: Secrets of Reverse Engineering , 2005 .

[11]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[12]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[13]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[14]  Vinod Yegneswaran,et al.  Eureka: A Framework for Enabling Static Malware Analysis , 2008, ESORICS.

[15]  Paul Watters,et al.  Fake File Detection in P2P Networks by Consensus and Reputation , 2011, 2011 First International Workshop on Complexity and Data Mining.

[16]  Mark E. Oxley,et al.  Using Qualia and Hierarchical Models in Malware Detection , 2009 .

[17]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[18]  Stephen McCombie,et al.  Cybercrime Attribution: An Eastern European Case Study , 2009 .

[19]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[20]  Michael Schatz,et al.  A toolkit for detecting and analyzing malicious software , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[21]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[22]  D. Ruppert,et al.  On the asymptotics of penalized splines , 2008 .

[23]  Inge Koch,et al.  Penalised spline support vector classifiers: computational issues , 2008, Comput. Stat..

[24]  Hamid Jahankhani,et al.  Analysis of firewall log-based detection scenarios for evidence in digital forensics , 2012, Int. J. Electron. Secur. Digit. Forensics.

[25]  Michael D. Smith,et al.  Host-based detection of worms through peer-to-peer cooperation , 2005, WORM '05.

[26]  Li Sun,et al.  Windows Rootkits: Attacks and Countermeasures , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[27]  Paul A. Watters,et al.  RBACS: Rootkit Behavioral Analysis and Classification System , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[28]  Ciprian M. Crainiceanu,et al.  Nonparametric Regression Methods for Longitudinal Data Analysis. Mixed-effects Modeling Approaches , 2007 .

[29]  P. Watters,et al.  The Seven Scam Types: Mapping the Terrain of Cybercrime , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[30]  Paul A. Watters,et al.  Information Security Governance: The Art of Detecting Hidden Malware , 2013 .

[31]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[32]  Eric Filiol,et al.  Behavioral detection of malware: from a survey towards an established taxonomy , 2008, Journal in Computer Virology.

[33]  A M.,et al.  Marginal longitudinal semiparametric regression via penalized splines , 2009 .

[34]  Paul A. Watters,et al.  Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures , 2011, AusDM.

[35]  Babak Bashari Rad,et al.  Metamorphic Virus Variants Classification Using Opcode Frequency Histogram , 2011, ArXiv.

[36]  Mamoun Alazab,et al.  Towards Understanding Malware Behaviour by the Extraction of API Calls , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[37]  Yanfang Ye,et al.  CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  M. Wand,et al.  Penalized Splines and Reproducing Kernel Methods , 2006 .

[39]  Richard Dazeley,et al.  Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[40]  Desmond Lobo,et al.  Identifying Rootkit Infections Using Data Mining , 2010, 2010 International Conference on Information Science and Applications.

[41]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[42]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[43]  Paul A. Watters,et al.  Recentred local profiles for authorship attribution , 2011, Natural Language Engineering.

[44]  Zhou,et al.  An Enhanced Automated Signature Generation Algorithm for Polymorphic Malware Detection , 2010 .

[45]  Sitalakshmi Venkatraman,et al.  EFFECTIVE DIGITAL FORENSIC ANALYSIS OF THE NTFS DISK IMAGE , 2009 .