Detecting faulty empty cells in spreadsheets

Spreadsheets play an important role in various business tasks, such as financial reports and data analysis. In spreadsheets, empty cells are widely used for different purposes, e.g., separating different tables, or default value "0". However, a user may delete a formula unintentionally, and leave a cell empty. Such ad-hoc modification may introduce a faulty empty cell that should have a formula. We observe that the context of an empty cell can help determine whether the empty cell is faulty. For example, is the empty cell next to a cell array in which all cells share the same semantics? Does the empty cell have headers similar to other non-empty cells'? In this paper, we propose EmptyCheck, to detect faulty empty cells in spreadsheets. By analyzing the context of an empty cell, EmptyCheck validates whether the cell belong to a cell array. If yes, the empty cell is faulty since it does not contain a formula. We evaluate EmptyCheck on 100 randomly sampled EUSES spreadsheets. The experimental result shows that EmptyCheck can detect faulty empty cells with high precision (75.00%) and recall (87.04%). Existing techniques can detect only 4.26% of the true faulty empty cells that EmptyCheck detects.

[1]  Wanjun Chen,et al.  CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[2]  Martin Erwig,et al.  SheetDiff: A Tool for Identifying Changes in Spreadsheets , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.

[3]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[4]  Arie van Deursen,et al.  Detecting and refactoring code smells in spreadsheet formulas , 2013, Empirical Software Engineering.

[5]  Emery D. Berger,et al.  CheckCell: data debugging for spreadsheets , 2014, OOPSLA.

[6]  Martin Erwig,et al.  GoalDebug: A Spreadsheet Debugger for End Users , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Arie van Deursen,et al.  Detecting and visualizing inter-worksheet smells in spreadsheets , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[8]  Jun Wei,et al.  VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[9]  Hugo Ribeiro,et al.  Towards a Catalog of Spreadsheet Smells , 2012, ICCSA.

[10]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[11]  Martin Erwig,et al.  Automatic detection of dimension errors in spreadsheets , 2009, J. Vis. Lang. Comput..

[12]  Felienne Hermans,et al.  Code smells in spreadsheet formulas revisited on an industrial dataset , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Gregg Rothermel,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, ACM SIGSOFT Softw. Eng. Notes.

[14]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[15]  Gregg Rothermel,et al.  What you see is what you test: a methodology for testing form-based visual programs , 1998, Proceedings of the 20th International Conference on Software Engineering.

[16]  Martin Erwig,et al.  AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[17]  Mary Shaw,et al.  Estimating the numbers of end users and end user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[18]  Chang Xu,et al.  CACheck: Detecting and Repairing Cell Arrays in Spreadsheets , 2017, IEEE Transactions on Software Engineering.

[19]  Rui Abreu,et al.  Constraint-based Debugging of Spreadsheets , 2012, CIbSE.

[20]  Franz Wotawa,et al.  Spectrum Enhanced Dynamic Slicing for better Fault Localization , 2012, ECAI.

[21]  Jie Wang,et al.  SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[22]  Bas Jansen Enron versus EUSES: A Comparison of Two Spreadsheet Corpora , 2015, SEMS@ICSE.

[23]  Franz Wotawa,et al.  Improving Spectrum-Based Fault Localization for Spreadsheet Debugging , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[24]  Arie van Deursen,et al.  Data clone detection and visualization in spreadsheets , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Emerson R. Murphy-Hill,et al.  Enron's Spreadsheets and Related Emails: A Dataset and Analysis , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[26]  Rui Abreu,et al.  On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets , 2013, FASE.

[27]  Jun Wei,et al.  Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation , 2014, ICSE.

[28]  Jácome Cunha,et al.  SmellSheet detective: A tool for detecting bad smells in spreadsheets , 2012, 2012 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[29]  Jun Wei,et al.  Detecting table clones and smells in spreadsheets , 2016, SIGSOFT FSE.