Beyond support and confidence: Exploring interestingness measures for rule-based specification mining

Numerous rule-based specification mining approaches have been proposed in the literature. Many of these approaches analyze a set of execution traces to discover interesting usage rules, e.g., whenever lock() is invoked, eventually unlock() is invoked. These techniques often generate and enumerate a set of candidate rules and compute some interestingness scores. Rules whose interestingness scores are above a certain threshold would then be output. In past studies, two measures, namely support and confidence, which are well-known measures, are often used to compute these scores. However, aside from these two, many other interestingness measures have been proposed. It is thus unclear if support and confidence are the best interestingness measures for specification mining. In this work, we perform an empirical study that investigates the utility of 38 interestingness measures in recovering correct specifications of classes from Java libraries. We used a ground truth dataset consisting of 683 rules and recorded execution traces that are produced when we run the DaCapo test suite. We apply 38 different interestingness measures to identify correct rules from a pool of candidate rules. Our study highlights that many measures are on par to support and confidence. Some of the measures are even better than support or confidence and at least one of the measures is statistically significantly better than the two measures. We also find that compositions of several measures with support statistically significantly outperform the composition of support and confidence. Our findings highlight the need to look beyond standard support and confidence to find interesting rules.

[1]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[2]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[3]  Chao Liu,et al.  Journal of Software Maintenance and Evolution: Research and Practice Mining Temporal Rules for Software Maintenance , 2022 .

[4]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[5]  Bogdan Dit,et al.  Integrated impact analysis for managing software changes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Gerard J. Holzmann,et al.  The SPIN Model Checker - primer and reference manual , 2003 .

[7]  David Lo,et al.  Scenario-based and value-based specification mining: better together , 2010, Automated Software Engineering.

[8]  George S. Avrunin,et al.  Patterns in property specifications for finite-state verification , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  D. Pfeiffer,et al.  Risk factors and characteristics of H5N1 Highly Pathogenic Avian Influenza (HPAI) post-vaccination outbreaks , 2008, Veterinary research.

[10]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[11]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[12]  Zhendong Su,et al.  Online inference and enforcement of temporal properties , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[14]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[15]  Benjamin I. Page,et al.  Effects of Public Opinion on Policy , 1983, American Political Science Review.

[16]  Thomas R. Gross,et al.  A framework for the evaluation of specification miners based on finite state machines , 2010, 2010 IEEE International Conference on Software Maintenance.

[17]  Tao Xie,et al.  Mining exception-handling rules as sequence association rules , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Dirk Fahland,et al.  Mining branching-time scenarios , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Erich Weede,et al.  Some Simple Calculations on Democracy and War Involvement , 1992 .

[20]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[21]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[22]  M. McHugh The odds ratio: calculation, usage and interpretation , 2009 .

[23]  Chao Liu,et al.  Mining past-time temporal rules from execution traces , 2008, WODA '08.

[24]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[25]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[26]  David Lo,et al.  Mining Quantified Temporal Rules: Formalism, Algorithms, and Evaluation , 2009, 2009 16th Working Conference on Reverse Engineering.

[27]  M. Stampfer Welding Occupations and Mortality from Parkinson's Disease and Other Neurodegenerative Diseases Among United States Men, 1985–1999 , 2009, Journal of occupational and environmental hygiene.

[28]  R B Jones,et al.  Prevalence and factors associated with aflatoxin contamination of peanuts from Western Kenya. , 2009, International journal of food microbiology.

[29]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[30]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[31]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[32]  Kamran Sartipi,et al.  Dynamic Analysis of Software Systems using Execution Pattern Mining , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[33]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.