A first step towards algorithm plagiarism detection

In this work, we address the problem of algorithm plagiarism, which occurs when a plagiarist, violating intellectual property rights, steals others' algorithms and covertly implements them. In contrast to software plagiarism, which has been extensively studied, limited attention has been paid to algorithm plagiarism. In this paper, we propose two dynamic value-based approaches, namely N-version and annotation, for algorithm plagiarism detection. Our approaches are motivated by the observation that there exist some critical runtime values which are irreplaceable and uneliminatable for all implementations of the same algorithm. The N-version approach extracts such values by filtering out non-core values. The annotation approach leverages auxiliary information to flag important variables which contain core values. We also propose a value dependence graph based similarity metric in addition to the longest common subsequence based one, in order to address the potential value reordering attack. We have implemented a prototype and evaluated the proposed schemes on various algorithms. The results show that our approaches to algorithm plagiarism detection are practical, effective and resilient to many automatic obfuscation techniques.

[1]  Akito Monden,et al.  Dynamic Software Birthmarks to Detect the Theft of Windows Applications , 2004 .

[2]  Christian S. Collberg,et al.  K-gram based software birthmarks , 2005, SAC '05.

[3]  Sencun Zhu,et al.  Value-based program characterization and its application to software plagiarism detection , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Sencun Zhu,et al.  Behavior based software theft detection , 2009, CCS.

[5]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[6]  Michael Stepp,et al.  Dynamic path-based software watermarking , 2004, PLDI '04.

[7]  Fenlin Liu,et al.  A Software Birthmark Based on Dynamic Opcode n-gram , 2007, International Conference on Semantic Computing (ICSC 2007).

[8]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[9]  R. Sekar,et al.  On the Limits of Information Flow Techniques for Malware Analysis and Containment , 2008, DIMVA.

[10]  Lukás Durfina,et al.  C source code obfuscator , 2012, Kybernetika.

[11]  Clark Thomborson,et al.  Tamper-proofing software watermarks , 2004 .

[12]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[13]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[14]  Koen De Bosschere,et al.  LOCO: an interactive code (De)obfuscation tool , 2006, PEPM '06.

[15]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[16]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[17]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[18]  David Schuler,et al.  A dynamic birthmark for java , 2007, ASE.

[19]  Sencun Zhu,et al.  Detecting Software Theft via System Call Based Birthmarks , 2009, 2009 Annual Computer Security Applications Conference.

[20]  Christian S. Collberg,et al.  Detecting Software Theft via Whole Program Path Birthmarks , 2004, ISC.

[21]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[22]  Christian S. Collberg,et al.  Software watermarking: models and dynamic embeddings , 1999, POPL '99.

[23]  Éva Tardos,et al.  Algorithm design , 2005 .

[24]  Lutz Prechelt,et al.  JPlag: Finding plagiarisms among a set of programs , 2000 .

[25]  Patrick Cousot,et al.  An abstract interpretation-based framework for software watermarking , 2004, POPL.

[26]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[27]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[28]  Akito Monden,et al.  Design and evaluation of birthmarks for detecting theft of java programs , 2004, IASTED Conf. on Software Engineering.

[29]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[30]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[31]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[32]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[33]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[34]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.