Mining Patterns in Source Code Using Tree Mining Algorithms

Discovering regularities in source code is of great interest to software engineers, both in academia and in industry, as regularities can provide useful information to help in a variety of tasks such as code comprehension, code refactoring, and fault localisation. However, traditional pattern mining algorithms often find too many patterns of little use and hence are not suitable for discovering useful regularities. In this paper we propose FREQTALS, a new algorithm for mining patterns in source code based on the FREQT tree mining algorithm. First, we introduce several constraints that effectively enable us to find more useful patterns; then, we show how to efficiently include them in FREQT. To illustrate the usefulness of the constraints we carried out a case study in collaboration with software engineers, where we identified a number of interesting patterns in a repository of Java code.

[1]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[2]  Miltiadis Allamanis,et al.  Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering , 2014 .

[3]  Kim Mens,et al.  Delving source code with formal concept analysis , 2005, Comput. Lang. Syst. Struct..

[4]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[5]  Nazha Selmaoui-Folcher,et al.  Frequent pattern mining in attributed trees: algorithms and applications , 2016, Knowledge and Information Systems.

[6]  Fernando Berzal Galiano,et al.  Frequent tree pattern mining: A survey , 2010, Intell. Data Anal..

[7]  Takeaki Uno,et al.  Frequent Pattern Mining , 2016, Encyclopedia of Algorithms.

[8]  Rishabh Singh,et al.  Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[9]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[10]  Kim Mens,et al.  Mining Source Code for Structural Regularities , 2010, 2010 17th Working Conference on Reverse Engineering.

[11]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[12]  W. Marsden I and J , 2012 .