How effectively can spreadsheet anomalies be detected: An empirical study

Abstract While spreadsheets are widely used, they have been found to be error-prone. Various techniques have been proposed to detect anomalies in spreadsheets, with varying scopes and effectiveness. Nevertheless, there is no empirical study comparing these techniques’ practical usefulness and effectiveness. In this work, we conducted a large-scale empirical study of three state-of-the-art techniques on their effectiveness in detecting spreadsheet anomalies. Our study focused on the precision, recall rate, efficiency and scope. We found that one technique outperforms the other two in precision and recall rate of spreadsheet anomaly detection. Efficiency of the three techniques is acceptable for most spreadsheets, but they may not be scalable to large spreadsheets with complex formulas. Besides, they have different scopes for detecting different spreadsheet anomalies, thus complementing to each other. We also discussed limitations of these three techniques. Based on our findings, we give suggestions for future spreadsheet research.

[1]  Jácome Cunha,et al.  Model-based programming environments for spreadsheets , 2014, Sci. Comput. Program..

[2]  Arie van Deursen,et al.  Data clone detection and visualization in spreadsheets , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Jun Wei,et al.  Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation , 2014, ICSE.

[4]  Dietmar Jannach,et al.  Model-based diagnosis of spreadsheet programs: a constraint-based debugging approach , 2016, Automated Software Engineering.

[5]  Glencora Borradaile,et al.  Planted-model evaluation of algorithms for identifying differences between spreadsheets , 2012, 2012 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[6]  Gregg Rothermel,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, ACM SIGSOFT Softw. Eng. Notes.

[7]  Emerson R. Murphy-Hill,et al.  Enron's Spreadsheets and Related Emails: A Dataset and Analysis , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  Arie van Deursen,et al.  Detecting code smells in spreadsheet formulas , 2011, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[9]  Margaret M. Burnett,et al.  Visually customizing inference rules about apples and oranges , 2002, Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments.

[10]  Danny Dig,et al.  Refactoring meets spreadsheet formulas , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[11]  Franz Wotawa,et al.  Avoiding, finding and fixing spreadsheet errors - A survey of automated approaches for spreadsheet QA , 2014, J. Syst. Softw..

[12]  Raymond R. Panko,et al.  The Detection of Human Spreadsheet Errors by Humans versus Inspection (Auditing) Software , 2010, ArXiv.

[13]  Stephen G. Powell,et al.  A critical review of the literature on spreadsheet errors , 2008, Decis. Support Syst..

[14]  Martin Erwig,et al.  Reasoning about spreadsheets with labels and dimensions , 2010, J. Vis. Lang. Comput..

[15]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[16]  Gregor Engels,et al.  Systematic evolution of model-based spreadsheet applications , 2012, J. Vis. Lang. Comput..

[17]  Rui Abreu,et al.  On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets , 2013, FASE.

[18]  Jácome Cunha,et al.  Smelling Faults in Spreadsheets , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[19]  Franz Wotawa,et al.  The Right Choice Matters! SMT Solving Substantially Improves Model-Based Debugging of Spreadsheets , 2013, 2013 13th International Conference on Quality Software.

[20]  Martin Erwig,et al.  Test-driven goal-directed debugging in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[21]  Martin Erwig,et al.  Mutation Operators for Spreadsheets , 2009, IEEE Transactions on Software Engineering.

[22]  Martin Erwig,et al.  Header and Unit Inference for Spreadsheets Through Spatial Analyses , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[23]  Martin Erwig,et al.  Dimension inference in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[24]  Martin Erwig,et al.  Automatic detection of dimension errors in spreadsheets , 2009, J. Vis. Lang. Comput..

[25]  Arie van Deursen,et al.  Detecting and visualizing inter-worksheet smells in spreadsheets , 2012, 2012 34th International Conference on Software Engineering (ICSE).