An integrated malware detection and classification system

This thesis is to develop effective and efficient methodologies which can be applied to continuously improve the performance of detection and classification on malware collected over an extended period of time. The robustness of the proposed methodologies has been tested on malware collected over 2003-2010.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[3]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  David W. Aha,et al.  Noise-Tolerant Instance-Based Learning Algorithms , 1989, IJCAI.

[6]  Chris Spatz,et al.  Basic Statistics: Tales of Distributions , 1981 .

[7]  Peng Li,et al.  On Challenges in Evaluating Malware Clustering , 2010, RAID.

[8]  A.H. Sung,et al.  Polymorphic malicious executable scanner by API sequence analysis , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Luanne Burns Goldrich,et al.  Concurrent Architecture for Automated Malware Classification , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[11]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Niek Oost Binary code analysis for application integration , 2008 .

[13]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[14]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[15]  Min Zhao,et al.  SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging , 2009, Journal in Computer Virology.

[16]  Robert Layton,et al.  Malware Detection Based on Structural and Behavioural Features of API Calls , 2010 .

[17]  Muhammad Zubair Shafiq,et al.  Are evolutionary rule learning algorithms appropriate for malware detection? , 2009, GECCO '09.

[18]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[19]  Marius Gheorghescu AN AUTOMATED VIRUS CLASSIFICATION SYSTEM , 2006 .

[20]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[21]  Jianmin Pang,et al.  Using API Sequence and Bayes Algorithm to Detect Suspicious Behavior , 2009, 2009 International Conference on Communication Software and Networks.

[22]  Mark Kellogg An Investigation of Machine Learning Techniques for the Detection of Unknown Malicious Code , 2011 .

[23]  Ron Kohavi,et al.  Targeting Business Users with Decision Table Classifiers , 1998, KDD.

[24]  Igor V. Kotenko,et al.  Malware Detection by Data Mining Techniques Based on Positionally Dependent Features , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[25]  Yanfang Ye,et al.  CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26]  A. Kohn [Computer viruses]. , 1989, Harefuah.

[27]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[28]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[29]  David Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[30]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[31]  Somesh Jha,et al.  Mining specifications of malicious behavior , 2008, ISEC '08.

[32]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[33]  Yanfang Ye,et al.  ISMCS: An intelligent instruction sequence based malware categorization system , 2009, 2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication.

[34]  Md. Rafiqul Islam,et al.  An automated classification system based on the strings of trojan and virus families , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[35]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[36]  Lynn Batten,et al.  Classification of Malware Based on String and Function Feature Selection , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[37]  Yuchun Tang,et al.  Support Vector Machines and Random Forests Modeling for Spam Senders Behavior Analysis , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[38]  Giovanni Vigna Static Disassembly and Code Analysis , 2007, Malware Detection.

[39]  Jian Li,et al.  Unknown Malware Detection Based on the Full Virtualization and SVM , 2009, 2009 International Conference on Management of e-Commerce and e-Government.

[40]  Morgan C. Wang,et al.  Data mining methods for malware detection , 2008 .

[41]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[42]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[43]  Eldad Eilam,et al.  Reversing: Secrets of Reverse Engineering , 2005 .

[44]  Alexander K. Seewald Towards Autmating Malware Classification and Characterization , 2008, Sicherheit.

[45]  Muhammad Zubair Shafiq,et al.  Using spatio-temporal information in API calls with machine learning algorithms for malware detection , 2009, AISec '09.

[46]  Barton P. Miller,et al.  Practical analysis of stripped binary code , 2005, CARN.

[47]  Olatz Arbelaitz,et al.  Evaluation of Malware clustering based on its dynamic behaviour , 2008, AusDM.

[48]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[49]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[50]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[51]  Dingxing Zhang,et al.  Using Support Vector Machine to Detect Unknown Computer Viruses , 2006 .

[52]  Keith Marzullo,et al.  Analysis of Computer Intrusions Using Sequences of Function Calls , 2007, IEEE Transactions on Dependable and Secure Computing.

[53]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[54]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[55]  Srinivas Mukkamala,et al.  Kernel machines for malware classification and similarity analysis , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[56]  Maya Gokhale,et al.  Comparison of feature selection and classification algorithms in identifying malicious executables , 2007, Comput. Stat. Data Anal..

[57]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[58]  Chris Eagle,et al.  The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler , 2008 .

[59]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[60]  Igor Santos,et al.  Semi-supervised Learning for Unknown Malware Detection , 2011, DCAI.

[61]  Stephanie Wehner,et al.  Analyzing worms and network traffic using compression , 2005, J. Comput. Secur..

[62]  Yong Hu,et al.  A scalable intelligent non-content-based spam-filtering framework , 2010, Expert Syst. Appl..

[63]  Yuval Elovici,et al.  Unknown malcode detection via text categorization and the imbalance problem , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[64]  James C. Foster Sockets, Shellcode, Porting, and Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals , 2005 .

[65]  Md. Rafiqul Islam,et al.  Differentiating malware from cleanware using behavioural analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[66]  David Elliott,et al.  In the Wild , 2010 .

[67]  Prateek Saxena Static Binary Analysis and Transformation fo Sandboxing Untrusted Plugins , 2007 .

[68]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[69]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[70]  Bezawada Bruhadeshwar,et al.  Signature Generation and Detection of Malware Families , 2008, ACISP.

[71]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[72]  Matt Pietrek,et al.  Peering Inside the PE: A Tour of the Win32 Portable Executable File Format , 1994 .

[73]  Nils J. Nilsson,et al.  AN EARLY DRAFT OF A PROPOSED TEXTBOOK , 2005 .

[74]  Nathalie Japkowicz,et al.  A Feature Selection and Evaluation Scheme for Computer Virus Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[75]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[76]  Radu State,et al.  Malware analysis with graph kernels and support vector machines , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[77]  D. Anguita,et al.  K-fold generalization capability assessment for support vector classifiers , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[78]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[79]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[80]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[81]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[82]  David B. Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[83]  Shengying Li,et al.  A Survey on Tools for Binary Code Analysis , 2004 .

[84]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[85]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[86]  Niklaus Wirth,et al.  A Brief History of Software Engineering , 2008, IEEE Annals of the History of Computing.

[87]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[88]  Radu State,et al.  Malware behaviour analysis , 2008, Journal in Computer Virology.

[89]  Vasilis Friderikos,et al.  Cross-Layer Optimization to Maximize Fairness Among TCP Flows of Different TCP Flavors , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[90]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[91]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[92]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[93]  Peter Van Roy,et al.  Concepts, Techniques, and Models of Computer Programming , 2004 .

[94]  Lior Rokach,et al.  Detection of unknown computer worms based on behavioral classification of the host , 2008, Comput. Stat. Data Anal..

[95]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[96]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[97]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[98]  Jingjing Yao,et al.  Malicious Executables Classification Based on Behavioral Factor Analysis , 2010, 2010 International Conference on e-Education, e-Business, e-Management and e-Learning.

[99]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[100]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[101]  John E. Bingham,et al.  A handbook of systems analysis , 1972 .

[102]  Farnam Jahanian,et al.  A Survey of Botnet Technology and Defenses , 2009, 2009 Cybersecurity Applications & Technology Conference for Homeland Security.

[103]  Muhammad Zubair Shafiq,et al.  Improving accuracy of immune-inspired malware detectors by using intelligent features , 2008, GECCO '08.

[104]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[105]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[106]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[107]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[108]  Thomas Stibor A Study of Detecting Computer Viruses in Real-Infected Files in the n-Gram Representation with Machine Learning Methods , 2010, IEA/AIE.

[109]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..