Counter intrusion software : malware detection using structural and behavioural features and machine learning

Over the past twenty-five years malicious software has evolved from a minor annoyance to a major security threat. Authors of malicious software are now more likely to be organised criminals than bored teenagers, and modern malicious software is more likely to be aimed at stealing data (and hence money) than trashing data. The arms race between malware authors and manufacturers of anti-malware software continues apace, but despite this, the majority of anti-malware solutions still rely on relatively old technology such as signature scanning, which works well enough in the majority of cases but which has long been known to be ineffective if signatures are not updated regularly. The need for regular updating means there is often a critical window---between the publication of a flaw exploitable by malware and the distribution of the appropriate counter measures or signature. At this point a user system is open to attack by hitherto unseen malware. The object of this thesis is to determine if it is practical to use machine learning techniques to abstract generic structural or behavioural features of malware which can then be used to recognise hitherto unseen examples. Although a sizeable amount of research has been done on various ways in which malware detection might be automated, most of the proposed methods are burdened by excessive complexity. This thesis looks specifically at the possibility of using learning systems to classify software as malicious or nonmalicious based on easily-collectable structural or behavioural data. On the basis of the experimental results presented herein it may be concluded that classification based on such structural data is certainly possible, and on behavioural data is at least feasible.

[1]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Kyung-Suk Lhee,et al.  Buffer overflow and format string overflow vulnerabilities , 2003, Softw. Pract. Exp..

[4]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[5]  Antonia J. Jones,et al.  Anti-keylogging measures for secure Internet login: An example of the law of unintended consequences , 2007, Comput. Secur..

[6]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[7]  J. B. Gunn Use of virus functions to provide a virtual APL interpreter under user control , 1984 .

[8]  Stefan Kuhr,et al.  Department of Mathematics and Computer Science , 2002 .

[9]  Clifford Stoll,et al.  The Cuckoo's Egg , 1989 .

[10]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[11]  Gerald L. Kovacich,et al.  Information Assurance: Surviving in the Information Environment , 2001 .

[12]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[13]  W. V. Eck Electromagnetic Radiation from Video Display Units: An Eavesdropping Risk? , 1996 .

[14]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[15]  Mads Torgersen,et al.  The c# programming language, third edition , 2008 .

[16]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[17]  Richard R. Linde,et al.  Operating system penetration , 1975, AFIPS '75.

[18]  Dan Boneh,et al.  Proceedings of the 11th USENIX Security Symposium , 2002 .

[19]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[20]  A. Watkins An immunological approach to intrusion detection , 2000 .

[21]  Nathaniel S. Borenstein,et al.  IBM ® , 2009 .

[22]  theEuroFj2 JoFj2 fo the InfoMj> ProMj>00 published bimoshed at j Puu0y0y , 2003 .

[23]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[24]  Joseph M. Newcomer,et al.  Win32 programming , 1997 .

[25]  Zhi-hong Zuo,et al.  Infection, imitation and a hierarchy of computer viruses , 2006, Comput. Secur..

[26]  Gerald L. Kovacich,et al.  Information Warfare , 2009, Encyclopedia of Information Assurance.

[27]  Stephen E. Deering,et al.  Internet Protocol, Version 6 (IPv6) Specification , 1995, RFC.

[28]  Mingtian Zhou,et al.  Some Further Theoretical Results about Computer Viruses , 2004, Comput. J..

[29]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[30]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[31]  Matt Pietrek,et al.  Peering Inside the PE: A Tour of the Win32 Portable Executable File Format , 1994 .

[32]  Bjarne Stroustrup,et al.  The C++ Programming Language, Second Edition , 1991 .

[33]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[34]  John F. Shoch,et al.  The “worm” programs—early experience with a distributed computation , 1982, CACM.

[35]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[36]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[37]  Fred Cohen,et al.  Computational aspects of computer viruses , 1989, Comput. Secur..

[38]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[39]  Gerald Tesauro,et al.  Neural networks for computer virus recognition , 1996 .

[40]  Charles Petzold Programming Windows®, Fifth Edition , 1998 .

[41]  S. F.R.,et al.  An Essay towards solving a Problem in the Doctrine of Chances . By the late Rev . Mr . Bayes , communicated by Mr . Price , in a letter to , 1999 .

[42]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[43]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[44]  Johnson M. Hart,et al.  Windows System Programming , 2004 .

[45]  Eugene H. Spafford,et al.  Defending a Computer System Using Autonomous Agents , 1995 .

[46]  R. Boudon,et al.  The Unintended Consequences of Social Action , 1984 .

[47]  Matt Pietrek,et al.  An in-depth look into the win32 portable executable le format , 2002 .

[48]  Matt Bishop,et al.  What Is Computer Security? , 2003, IEEE Secur. Priv..

[49]  Steve R. White,et al.  Anatomy of a Commercial-Grade Immune System , 1999 .

[50]  Pieter Reitsma,et al.  Educational and Psychological Measurement , 2003 .

[51]  Russ Housley,et al.  Security flaws in 802.11 data link protocols , 2003, CACM.

[52]  Steve R. White,et al.  Computers and epidemiology , 1993, IEEE Spectrum.

[53]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[54]  Carl S. Young,et al.  Infection dynamics on the Internet , 2005, Comput. Secur..

[55]  Wonil Kim,et al.  Effective Detector Set Generation and Evolution for Artificial Immune System , 2004, International Conference on Computational Science.

[56]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[57]  David S. Munro,et al.  In: Software-Practice and Experience , 2000 .

[58]  Danny Bradbury The metamorphosis of malware writers , 2006, Comput. Secur..

[59]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[60]  Eugene H. Spafford,et al.  Active Defense of a Computer System using Autonomous Agents , 1995 .

[61]  Matt Bishop,et al.  An Overview of Computer Viruses in a Research Environment , 1991 .

[62]  Leonard M. Adleman,et al.  An Abstract Theory of Computer Viruses , 1988, CRYPTO.

[63]  Mark Russinovich,et al.  Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server(TM) 2003, Windows XP, and Windows 2000 (Pro-Developer) , 2004 .

[64]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[65]  Joseph R. Rabaiotti,et al.  Malware Detection using Structural and Behavioural Features and Machine Learning , 2007 .