Mining Change Histories for Unknown Systematic Edits

Software developers often need to repeat similar modifications in multiple different locations of a system's source code. These repeated similar modifications, or systematic edits, can be both tedious and error-prone to perform manually. While there are tools that can be used to assist in automating systematic edits, it is not straightforward to find out where the occurrences of a systematic edit are located in an existing system. This knowledge is valuable to help decide whether refactoring is needed, or whether future occurrences of an existing systematic edit should be automated. In this paper, we tackle the problem of finding unknown systematic edits using a closed frequent itemset mining algorithm, operating on sets of distilled source code changes. This approach has been implemented for Java programs in a tool called SysEdMiner. To evaluate the tool's precision and scalability, we have applied it to an industrial use case.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Stas Negara,et al.  Mining fine-grained code changes to detect unknown change patterns , 2014, ICSE.

[3]  Robert L. Grossman,et al.  Proceedings of the 2002 SIAM International Conference on Data Mining , 2002 .

[4]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[5]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[6]  Heikki Mannila,et al.  Discovering frequent episodes in sequences extended abstract , 1995, KDD 1995.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Miryung Kim,et al.  Does Automated Refactoring Obviate Systematic Editing? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[9]  Michael Philippsen,et al.  Automatic Clustering of Code Changes , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[10]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[11]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[12]  Coen De Roover,et al.  Prevalence and Maintenance of Automated Functional Tests for Web Applications , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[13]  Martin P. Robillard,et al.  Temporal analysis of API usage concepts , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  Miryung Kim,et al.  Discovering and representing systematic code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[15]  Michael D. Ernst,et al.  CBCD: Cloned buggy code detector , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[17]  Eleni Stroulia,et al.  API-Evolution Support with Diff-CatchUp , 2007, IEEE Transactions on Software Engineering.

[18]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[19]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[20]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[21]  Reinout Stevens A Declarative Foundation for Comprehensive History Querying , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[22]  Andy Zaidman,et al.  Identifying cross-cutting concerns using software repository mining , 2010, IWPSE-EVOL '10.

[23]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[24]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Miryung Kim,et al.  A graph-based approach to API usage adaptation , 2010, OOPSLA.

[26]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[27]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.