Internet-scale Real-time Code Clone Search Via Multi-level Indexing

Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.

[1]  Juergen Rilling,et al.  Semantic Web-based Source Code Search , 2010 .

[2]  Ying Zou,et al.  A Technique for Just-in-Time Clone Detection in Large Scale Systems , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[3]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[4]  Elmar Jürgens,et al.  Index-based code clone detection: incremental, distributed, scalable , 2010, 2010 IEEE International Conference on Software Maintenance.

[5]  Iman Keivanloo,et al.  SeClone - A Hybrid Approach to Internet-Scale Real-Time Code Clone Search , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[6]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[8]  Seung-won Hwang,et al.  Instant code clone search , 2010, FSE '10.

[9]  Hajimu Iida,et al.  SHINOBI: A Tool for Automatic Code Clone Detection in the IDE , 2009, 2009 16th Working Conference on Reverse Engineering.