The Software Similarity Problem in Malware Analysis

In software engineering contexts software may be compared for similarity in order to detect duplicate code that indicates poor design, and to reconstruct evolution history. Malicious software, being nothing other than a particular type of software, can also be compared for simi- larity in order to detect commonalities and evolution history. This paper provides a brief introduction to the issue of measuring similarity between malicious programs, and how evolution is known to occur in the area. It then uses this review to try to draw lines that connect research in software engineering (e.g., on "clone detection" ) to problems in anti- malware research.

[1]  Peter Szor,et al.  HUNTING FOR METAMORPHIC , 2001 .

[2]  William S. Evans Program Compression , 2006, Duplication, Redundancy, and Similarity in Software.

[3]  Mattia Monga,et al.  Using Code Normalization for Fighting Self-Mutating Malware , 2006, ISSSE.

[4]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[5]  Udi Manber,et al.  Deducing Similarities in Java Sources from Bytecodes , 1998, USENIX Annual Technical Conference.

[6]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[7]  Fintan Culwin,et al.  A Comparison of Source Code Plagiarism Detection Engines , 2004, Comput. Sci. Educ..

[8]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[9]  Robert Lyda,et al.  Exploring Investigative Methods for Identifying and Profiling Serial Bots , 2006, J. Digit. Forensic Pract..

[10]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[11]  Andrew Walenstein,et al.  Normalizing Metamorphic Malware Using Term Rewriting , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[12]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[13]  Andrew Walenstein,et al.  Exploiting Similarity Between Variants to Defeat Malware “ Vilo ” Method for Comparing and Searching Binary Programs , 2007 .

[14]  Katsuro Inoue,et al.  Measuring Similarity of Large Software Systems Based on Source Code Correspondence , 2005, PROFES.

[15]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[16]  Michael W. Godfrey,et al.  Using origin analysis to detect merging and splitting of source code entities , 2005, IEEE Transactions on Software Engineering.