Author Identification of Software Source Code with Program Dependence Graphs

With the significant increase of computer and Internet based crimes, it becomes increasingly important to have techniques that can be applied in a legal setting to assist the court in making judgements about malware, theft of code and computer fraud. To better deal with author identification of software, we propose a semantic approach to identifying authorship through the comparison of program data flows. To do so, we compute program dependences, compute program similarity if detecting theft of code is needed, and thus query about not only the syntactic structure of programs but also the data flow within in order to discriminate authors. The experimental result reveals that our technique is more robust even with some intentional code modifications.

[1]  Janis Grundspenkis,et al.  Computer-based plagiarism detection methods and tools: an overview , 2007, CompSysTech '07.

[2]  Stephen G. MacDonell,et al.  IDENTIFIED: software authorship analysis with case-based reasoning , 1998 .

[3]  Justin Zobel,et al.  Efficient plagiarism detection for large code repositories , 2007 .

[4]  Ashraf Elnagar,et al.  PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach , 2008, Int. J. Bus. Intell. Data Min..

[5]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[6]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[7]  Michelle Craig,et al.  Plagiarism detection using feature-based neural networks , 2007, SIGCSE.

[8]  Efstathios Stamatatos,et al.  Source Code Authorship Analysis For Supporting the Cybercrime Investigation Process , 2010, Handbook of Research on Computational Forensics, Digital Crime, and Investigation.

[9]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[10]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[11]  Ann-Marie Lancaster,et al.  A plagiarism detection system , 1981, SIGCSE '81.

[12]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[13]  Athena Vakali,et al.  PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets , 2005, Comput. J..