Comparing Approaches to Mining Source Code for Call-Usage Patterns

Two approaches for mining function-call usage patterns from source code are compared The first approach, itemset mining, has recently been applied to this problem. The other approach, sequential-pattern mining, has not been previously applied to this problem. Here, a call-usage pattern is a composition of function calls that occur in a function definition. Both approaches look for frequently occurring patterns that represent standard usage of functions and identify possible errors. Itemset mining produces unordered patterns, i.e., sets of function calls, whereas, sequential-pattern mining produces partially ordered patterns, i.e., sequences of function calls. The trade-off between the additional ordering context given by sequential-pattern mining and the efficiency of itemset mining is investigated. The two approaches are applied to the Lima kernel v2.6.14 and results show that mining ordered patterns is worth the additional cost.

[1]  Chadd C. Williams,et al.  Recovering system specific rules from software repositories , 2005, MSR '05.

[2]  Bart Goethals,et al.  Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations , 2005, KDD 2005.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Michael D. Ernst,et al.  Invariant inference for static checking: , 2002, SIGSOFT '02/FSE-10.

[5]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[6]  Eleni Stroulia,et al.  Mining Software Usage Data , 2004, MSR.

[7]  F. Masseglia,et al.  Sequential Pattern Mining : A Survey on Issues and Approaches , 2004 .

[8]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[9]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[10]  Maguelonne Teisseire,et al.  Sequential Pattern Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[11]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[12]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[13]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[14]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[15]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[16]  Jonathan I. Maletic,et al.  Mining sequences of changed-files from version histories , 2006, MSR '06.

[17]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[18]  Monica S. Lam,et al.  Automatic extraction of object-oriented component interfaces , 2002, ISSTA '02.

[19]  Michael D. Ernst,et al.  Invariant inference for static checking: an empirical evaluation , 2002, SOEN.

[20]  Michael Burch,et al.  Visual data mining in software archives , 2005, SoftVis '05.