An evaluation of source code mining techniques

This paper reviews the tools and techniques which rely only on data mining methods to determine patterns from source code such as programming rules, copy paste code segments, and API usage. The work provides comparison and evaluation of the current state-of-the-art in source code mining techniques. Furthermore it identifies the essential strengths and weaknesses of individual tools and techniques to make an evaluation indicative of future potential. The pervious related works only focus on one specific pattern being mined such as special kind of bug detection. Thus, there is a need of multiple tools to test and find potential information from software which increase cost and time of development. Hence there is a strong need of tool which helps in developing quality software by automatically detecting different kind of bugs in one pass and also provides code reusability for the developers.

[1]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[2]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[3]  Suresh Jagannathan,et al.  Path-Sensitive Inference of Function Precedence Protocols , 2007, 29th International Conference on Software Engineering (ICSE'07).

[4]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[5]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[6]  Dietmar Seipel,et al.  Clone detection in source code by frequent itemset techniques , 2004 .

[7]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[8]  Jian Pei,et al.  Mining Software Engineering Data , 2007, ICSE Companion.

[9]  Jiong Yang,et al.  Finding what's not there: a new approach to revealing neglected conditions in software , 2007, ISSTA '07.

[10]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[11]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[12]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[13]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[14]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[15]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[16]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[17]  Gail C. Murphy,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..