JSidentify: A Hybrid Framework for Detecting Plagiarism Among JavaScript Code in Online Mini Games

Online mini games are lightweight game apps, typically implemented in JavaScript (JS), that run inside another host mobile app (such as WeChat, Baidu, and Alipay). These mini games do not need to be downloaded or upgraded through an app store, making it possible for one host mobile app to perform the aggregated services of many apps. Hundreds of millions of users play tens of thousands of mini games, which make a great profit, and consequently are popular targets of plagiarism. In cases of plagiarism, deeply obfuscated code cloned from the original code often embodies malicious code segments and copyright infringements, posing great challenges for existing plagiarism detection tools. To address these challenges, in this paper, we design and implement JSidentify, a hybrid framework to detect plagiarism among online mini games. JSidentify includes three techniques based on different levels of code abstraction. JSidentify applies the included techniques in the constructed priority list one by one to reduce overall detection time. Our evaluation results show that JSidentify outperforms other existing related state-of-the-art approaches and achieves the best precision and recall with affordable detection time when detecting plagiarism among online mini games and clones among general JS programs. Our deployment experience of JSidentify also shows that JSidentify is indispensable in the daily operations of online mini games in WeChat.

[1]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[2]  Yanzhao Wu,et al.  CCAligner: A Token Based Large-Gap Clone Detector , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[4]  Xiaotie Deng,et al.  A new suffix tree similarity measure for document clustering , 2007, WWW '07.

[5]  Thomas R. Dean,et al.  Using clone detection to find malware in acrobat files , 2013, CASCON.

[6]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[7]  Mike Joy,et al.  Towards a Definition of Source-Code Plagiarism , 2008, IEEE Transactions on Education.

[8]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[9]  Marco Tulio Valente,et al.  JSClassFinder: A Tool to Detect Class-like Structures in JavaScript , 2016, ArXiv.

[10]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[11]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[12]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[13]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[14]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[15]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[16]  Neil Davey,et al.  The development of a software clone detector , 1995 .

[17]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[18]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[19]  Peng Liu,et al.  Achieving accuracy and scalability simultaneously in detecting application clones on Android markets , 2014, ICSE.

[20]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[21]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[22]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[23]  Sukyoung Ryu,et al.  Analysis of JavaScript Programs , 2017, ACM Comput. Surv..

[24]  Michael D. Ernst,et al.  CBCD: Cloned buggy code detector , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[25]  Costas S. Iliopoulos,et al.  New efficient algorithms for the LCS and constrained LCS problems , 2008, Inf. Process. Lett..

[26]  Raminder Kaur,et al.  Clone detection in software source code using operational similarity of statements , 2014, SOEN.

[27]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[28]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[29]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[30]  Arutyun Avetisyan,et al.  LLVM-based code clone detection framework , 2015, 2015 Computer Science and Information Technologies (CSIT).

[31]  Sunghun Kim,et al.  Development nature matters: An empirical study of code clones in JavaScript applications , 2015, Empirical Software Engineering.

[32]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[33]  Sencun Zhu,et al.  Value-based program characterization and its application to software plagiarism detection , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[34]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[35]  Rosco Hill,et al.  Automatic method completion , 2004, Proceedings. 19th International Conference on Automated Software Engineering, 2004..

[36]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[37]  Siu-Ming Yiu,et al.  Heap Graph Based Software Theft Detection , 2013, IEEE Transactions on Information Forensics and Security.

[38]  Yanyan Jiang,et al.  Needle: detecting code plagiarism on student submissions , 2018 .

[39]  Arnaldo Hernández del Campo Just-in-time manufacturing : a practical approach , 1989 .

[40]  Jugal K. Kalita,et al.  Expert Systems With Applications , 2022 .