Avoiding, finding and fixing spreadsheet errors - A survey of automated approaches for spreadsheet QA

Spreadsheet programs can be found everywhere in organizations and they are used for a variety of purposes, including financial calculations, planning, data aggregation and decision making tasks. A number of research surveys have however shown that such programs are particularly prone to errors. Some reasons for the error-proneness of spreadsheets are that spreadsheets are developed by end users and that standard software quality assurance processes are mostly not applied. Correspondingly, during the last two decades, researchers have proposed a number of techniques and automated tools aimed at supporting the end user in the development of error-free spreadsheets. In this paper, we provide a review of the research literature and develop a classification of automated spreadsheet quality assurance (QA) approaches, which range from spreadsheet visualization, static analysis and quality reports, over testing and support to model-based spreadsheet development. Based on this review, we outline possible opportunities for future work in the area of automated spreadsheet QA.

[1]  Simon Thorne A Review of Spreadsheet Error Reduction Techniques , 2009, Commun. Assoc. Inf. Syst..

[2]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[3]  Martin Erwig,et al.  Mutation Operators for Spreadsheets , 2009, IEEE Transactions on Software Engineering.

[4]  Gregg Rothermel,et al.  A methodology for testing spreadsheets , 2001, TSEM.

[5]  Stephen G. Powell,et al.  A critical review of the literature on spreadsheet errors , 2008, Decis. Support Syst..

[6]  Raymond R. Panko,et al.  What we know about spreadsheet errors , 1998 .

[7]  John D. Gould,et al.  An experimental study of people creating spreadsheets , 1987, TOIS.

[8]  Martin Erwig,et al.  Header and Unit Inference for Spreadsheets Through Spatial Analyses , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[9]  Danny Dig,et al.  Refactoring meets spreadsheet formulas , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[10]  Brian Knight,et al.  Classification of Spreadsheet Errors , 2008, ArXiv.

[11]  Martin Erwig,et al.  A Type System Based on End-User Vocabulary , 2007 .

[12]  Gregg Rothermel,et al.  Testing Homogeneous Spreadsheet Grids with the "What You See Is What You Test" Methodology , 2002, IEEE Trans. Software Eng..

[13]  Martin Erwig Software Engineering for Spreadsheets , 2009, IEEE Software.

[14]  Arie van Deursen,et al.  Detecting and visualizing inter-worksheet smells in spreadsheets , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[15]  Martin Erwig,et al.  Visual specifications of correct spreadsheets , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[16]  J.Steve Davis Tools for spreadsheet auditing , 1996, Int. J. Hum. Comput. Stud..

[17]  Markus Clermont Heuristics for the automatic identification of irregularities in spreadsheets , 2005, WEUSE@ICSE.

[18]  Markus Stumptner,et al.  Consistency-based diagnosis of configuration knowledge bases , 1999, Artif. Intell..

[19]  Peter Sestoft,et al.  Sheet-Defined Functions: Implementation and Initial Evaluation , 2013, IS-EUD.

[20]  Sabine Hipfl Using Layout Information for Spreadsheet Visualization , 2008, ArXiv.

[21]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[22]  Rui Abreu,et al.  On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets , 2013, FASE.

[23]  Dennis F. Galletta,et al.  Spreadsheet Presentation and Error Detection: An Experimental Study , 1997, J. Manag. Inf. Syst..

[24]  Margaret M. Burnett,et al.  Reasoning about many-to-many requirement relationships in spreadsheets , 2002, Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments.

[25]  Margaret M. Burnett,et al.  Sharing reasoning about faults in spreadsheets: An empirical study , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[26]  G. Rothermel,et al.  An empirical study of fault localization for end-user programmers , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[27]  김종덕,et al.  Interactive. , 1996, Nursing older people.

[28]  Jácome Cunha,et al.  From relational ClassSheets to UML+OCL , 2012, SAC '12.

[29]  Kenneth S. Rogoff,et al.  Growth in a Time of Debt , 2010 .

[30]  Raymond R. Panko,et al.  Revising the Panko-Halverson taxonomy of spreadsheet errors , 2008, Decis. Support Syst..

[31]  Raymond R. Panko,et al.  Applying Code Inspection to Spreadsheet Testing , 1999, J. Manag. Inf. Syst..

[32]  Ying Chen,et al.  Visual Checking of Spreadsheets , 2008, ArXiv.

[33]  Dietmar Jannach,et al.  Model-based diagnosis of spreadsheet programs: a constraint-based debugging approach , 2016, Automated Software Engineering.

[34]  Patrick O'Beirne Spreadsheet Refactoring , 2010, ArXiv.

[35]  Jácome Cunha,et al.  End-Users Productivity in Model-Based Spreadsheets: An Empirical Study , 2011, IS-EUD.

[36]  Martin Erwig,et al.  AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[37]  John Hunt An approach for the automated risk assessment of structural differences between spreadsheets (DiffXL) , 2009, ArXiv.

[38]  Gregg Rothermel,et al.  What you see is what you test: a methodology for testing form-based visual programs , 1998, Proceedings of the 20th International Conference on Software Engineering.

[39]  Mike O'Hara,et al.  Spreadsheet Auditing Software , 2010, ArXiv.

[40]  Gregg Rothermel,et al.  Testing strategies for form-based visual programs , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[41]  Markus Montigel Portability and reuse of components for spreadsheet languages , 2002, Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments.

[42]  Thomas C. Herndon,et al.  Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff , 2014 .

[43]  Margaret M. Burnett,et al.  Similarity inheritance: a new model of inheritance for spreadsheet VPLs , 1998, Proceedings. 1998 IEEE Symposium on Visual Languages (Cat. No.98TB100254).

[44]  Jocelyn Paine,et al.  Rapid Spreadsheet Reshaping with Excelsior: multiple drastic changes to content and layout are easy when you represent enough structure , 2008, ArXiv.

[45]  Nicholas P. Wilde A WYSIWYC (what you see is what you compute) spreadsheet , 1993, Proceedings 1993 IEEE Symposium on Visual Languages.

[46]  Yutaka Matsushita,et al.  3D interactive visualization for inter-cell dependencies of spreadsheets , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[47]  Dennis F. Galletta,et al.  An empirical study of spreadsheet error-finding performance , 1993 .

[48]  Martin Erwig,et al.  Dimension inference in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[49]  Martin Erwig,et al.  Gencel: a program generator for correct spreadsheets , 2006, J. Funct. Program..

[50]  D. Nardi,et al.  Automatic generation of explanations for spreadsheet applications , 1994, Proceedings of the Tenth Conference on Artificial Intelligence for Applications.

[51]  Gregg Rothermel,et al.  Interactive fault localization techniques in a spreadsheet environment , 2006, IEEE Transactions on Software Engineering.

[52]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[53]  Gregg Rothermel,et al.  Slicing spreadsheets: an integrated methodology for spreadsheet testing and debugging , 1999, DSL '99.

[54]  Roland Mittermeir,et al.  Finding high-level structures in spreadsheet programs , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[55]  Matthias Felleisen,et al.  Validating the unit correctness of spreadsheet programs , 2004, Proceedings. 26th International Conference on Software Engineering.

[56]  Kevin McDaid,et al.  Test-driven development: can it work for spreadsheets? , 2008, WEUSE '08.

[57]  David Ball,et al.  A Novel Approach to Formulae Production and Overconfidence Measurement to Reduce Risk in Spreadsheet Modelling , 2008, ArXiv.

[58]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[59]  Margaret M. Burnett,et al.  Rewarding "Good" Behavior: End-User Debugging and Rewards , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[60]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[61]  Bay-Wei Chang,et al.  Fluid visualization of spreadsheet structures , 1998, Proceedings. 1998 IEEE Symposium on Visual Languages (Cat. No.98TB100254).

[62]  Raymond R. Panko,et al.  The Detection of Human Spreadsheet Errors by Humans versus Inspection (Auditing) Software , 2010, ArXiv.

[63]  Gregg Rothermel,et al.  Test reuse in the spreadsheet paradigm , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[64]  Arie van Deursen,et al.  Breviz: Visualizing Spreadsheets using Dataflow Diagrams , 2011, ArXiv.

[65]  Martin Erwig,et al.  Inferring templates from spreadsheets , 2006, ICSE '06.

[66]  Marian Petre,et al.  Usability Analysis of Visual Programming Environments: A 'Cognitive Dimensions' Framework , 1996, J. Vis. Lang. Comput..

[67]  Mouzhi Ge,et al.  Recommender Systems in Computer Science and Information Systems-a Landscape of Research , 2012 .

[68]  Gregg Rothermel,et al.  Scaling up a "What you see is what you test" methodology to spreadsheet grids , 1999, Proceedings 1999 IEEE Symposium on Visual Languages.

[69]  Jácome Cunha,et al.  From spreadsheets to relational databases and back , 2009, PEPM '09.

[70]  Clayton Lewis NoPumpG: Creating Interactive Graphics With Spreadsheet Machinery , 1987 .

[71]  Martin Erwig,et al.  Reasoning about spreadsheets with labels and dimensions , 2010, J. Vis. Lang. Comput..

[72]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[73]  Daniel Port,et al.  End User Computing: The Dark Matter (and Dark Energy) of Corporate IT , 2012, 2012 45th Hawaii International Conference on System Sciences.

[74]  Jácome Cunha,et al.  Automatically Inferring ClassSheet Models from Spreadsheets , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.

[75]  Jocelyn Paine Excelsior: Bringing the Benefits of Modularisation to Excel , 2008, ArXiv.

[76]  Brad A. Myers,et al.  Using objects of measurement to detect spreadsheet errors , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[77]  Martin Erwig,et al.  Automatic detection of dimension errors in spreadsheets , 2009, J. Vis. Lang. Comput..

[78]  Roland Mittermeir,et al.  Spreadsheet Debugging , 2003, ArXiv.

[79]  John C. Mitchell,et al.  Concepts in programming languages , 2002 .

[80]  Nelishia Pillay,et al.  Analysis of spreadsheet errors made by computer literacy students , 2004, IEEE International Conference on Advanced Learning Technologies, 2004. Proceedings..

[81]  John Millar Carroll HCI Models, Theories, and Frameworks: Toward a Multidisciplinary Science , 2003 .

[82]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[83]  Markus Clermont A Toolkit for Scalable Spreadsheet Visualization , 2008, ArXiv.

[84]  Henry C. Lucas,et al.  Spreadsheet analysis and design , 1989, CACM.

[85]  Jácome Cunha,et al.  Model-Based Programming Environments for Spreadsheets , 2012, SBLP.

[86]  Raymond R. Panko,et al.  Spreadsheets on trial: a survey of research on spreadsheet risks , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[87]  Martin Erwig,et al.  Type inference for spreadsheets , 2006, PPDP '06.

[88]  Jocelyn Paine Spreadsheet Structure Discovery with Logic Programming , 2008, ArXiv.

[89]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[90]  Richard Brath,et al.  Excel Visualizer: One Click WYSIWYG Spreadsheet Visualization , 2006, Tenth International Conference on Information Visualisation (IV'06).

[91]  Markus Clermont Analyzing large spreadsheet programs , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[92]  Edward P. K. Tsang,et al.  Foundations of constraint satisfaction , 1993, Computation in cognitive science.

[93]  Felienne Hermans Improving spreadsheet test practices , 2013, CASCON.

[94]  Roland Mittermeir,et al.  Metrics-Based Spreadsheet Visualization: Support for Focused Maintenance , 2008, ArXiv.

[95]  Arie van Deursen,et al.  Automatically Extracting Class Diagrams from Spreadsheets , 2010, ECOOP.

[96]  Rui Abreu,et al.  Constraint-based Debugging of Spreadsheets , 2012, CIbSE.

[97]  Arie van Deursen,et al.  Data clone detection and visualization in spreadsheets , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[98]  Jácome Cunha,et al.  Discovery-based edit assistance for spreadsheets , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[99]  M. Fisher,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, WEUSE@ICSE.

[100]  Gregg Rothermel,et al.  Automated test case generation for spreadsheets , 2002, ICSE '02.

[101]  Roland Mittermeir,et al.  A Spreadsheet Auditing Tool Evaluated in an Industrial Context , 2008, ArXiv.

[102]  Ralph H. Sprague,et al.  Hitting the wall: errors in developing and code inspecting a 'simple' spreadsheet model , 1998, Decis. Support Syst..

[103]  Jocelyn Paine Ensuring Spreadsheet Integrity with Model Master , 2008, ArXiv.

[104]  Arie van Deursen,et al.  Detecting code smells in spreadsheet formulas , 2011, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[105]  Jácome Cunha,et al.  Type-Safe Evolution of Spreadsheets , 2011, FASE.

[106]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[107]  Gregg Rothermel,et al.  Integrating automated test generation into the WYSIWYT spreadsheet testing methodology , 2006, TSEM.

[108]  Gregg Rothermel,et al.  Visually testing recursive programs in spreadsheet languages , 2001, Proceedings IEEE Symposia on Human-Centric Computing Languages and Environments (Cat. No.01TH8587).

[109]  Gregg Rothermel,et al.  End-user software engineering with assertions in the spreadsheet paradigm , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[110]  David G. Hendry,et al.  CogMap: a Visual Description Language for Spreadsheets , 1993, J. Vis. Lang. Comput..

[111]  Arie van Deursen,et al.  Supporting professional spreadsheet users by generating leveled dataflow diagrams , 2010, 2011 33rd International Conference on Software Engineering (ICSE).

[112]  Margaret M. Burnett,et al.  Adding Apples and Oranges , 2002, PADL.

[113]  Henry C. Lucas,et al.  Toward a logical/physical theory of spreadsheet modeling , 1992, TOIS.

[114]  Gregg Rothermel,et al.  WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[115]  James Noble,et al.  Spreadsheet Visualisation to Improve End-user Understanding , 2003, InVis.au.

[116]  Margaret M. Burnett,et al.  Designing features for both genders in end-user programming environments , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[117]  Martin Erwig,et al.  Combining spatial and semantic label analysis , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[118]  Mark G. Simkin,et al.  Factors Affecting the Ability to Detect Spreadsheet Errors , 2006 .

[119]  Rui Abreu,et al.  Debugging Spreadsheets: A CSP-based Approach , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops.

[120]  Simon L. Peyton Jones,et al.  A user-centred approach to functions in Excel , 2003, ICFP '03.

[121]  Gregor Engels,et al.  ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications , 2005, ASE '05.

[122]  Hugo Ribeiro,et al.  Towards a Catalog of Spreadsheet Smells , 2012, ICCSA.

[123]  Fredrik Karlsson Using two heads in practice , 2008, WEUSE '08.

[124]  Brian Knight,et al.  Quality Control in Spreadsheets: A Visual Approach using Color Codings to Reduce Errors in Formulae , 2004, Software Quality Journal.

[125]  Gerhard Fischer,et al.  Reuse-Conducive Development Environments , 2005, Automated Software Engineering.

[126]  Gregor Engels,et al.  Systematic evolution of model-based spreadsheet applications , 2012, J. Vis. Lang. Comput..

[127]  Jácome Cunha,et al.  Extension and implementation of ClassSheet models , 2012, 2012 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[128]  Yirsaw Ayalew A visualization-based approach for improving spreadsheet quality , 2009 .

[129]  Daniele Nardi,et al.  Self-instructive spreadsheets: an environment for automatic knowledge acquisition and tutor generation , 2000, Int. J. Hum. Comput. Stud..

[130]  Raymond Reiter,et al.  A Theory of Diagnosis from First Principles , 1986, Artif. Intell..

[131]  Franz Wotawa,et al.  The Right Choice Matters! SMT Solving Substantially Improves Model-Based Debugging of Spreadsheets , 2013, 2013 13th International Conference on Quality Software.

[132]  Martin Erwig,et al.  Test-driven goal-directed debugging in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[133]  John Morris,et al.  A Generalised Spreadsheet Verification Methodology , 2002, ACSC.

[134]  D. Jannach,et al.  Toward model-based debugging of spreadsheet programs , 2010 .

[135]  Martin Erwig,et al.  Goal-directed debugging of spreadsheets , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[136]  Clayton Lewis,et al.  Spreadsheet-based interactive graphics: from prototype to tool , 1990, CHI '90.

[137]  Joline Morrison,et al.  Using a structured design approach to reduce risks in end user spreadsheet development , 2000, Inf. Manag..

[138]  Martin Erwig,et al.  GoalDebug: A Spreadsheet Debugger for End Users , 2007, 29th International Conference on Software Engineering (ICSE'07).

[139]  Margaret M. Burnett,et al.  Exception Handling in the Spreadsheet Paradigm , 2000, IEEE Trans. Software Eng..

[140]  Margaret M. Burnett,et al.  Journal of Visual Languages & Computing Interactive, Visual Fault Localization Support for End-user Programmers This Paper Updates and Extends Earlier Work That Appeared In , 2022 .

[141]  Bill Bekenn,et al.  Reducing Spreadsheet Risk with FormulaDataSleuth , 2008, ArXiv.

[142]  Gregg Rothermel,et al.  Harnessing curiosity to increase correctness in end-user programming , 2003, CHI '03.

[143]  Glencora Borradaile,et al.  Planted-model evaluation of algorithms for identifying differences between spreadsheets , 2012, 2012 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[144]  Markus Stumptner,et al.  Model-Based Debugging of Java Programs , 2000, AADEBUG.

[145]  Jácome Cunha,et al.  Bidirectional Transformation of Model-Driven Spreadsheets , 2012, ICMT@TOOLS.

[146]  S. Ditlea,et al.  Spreadsheets can be hazardous to your health , 1987 .

[147]  Gregg Rothermel,et al.  Scaling a Dataflow Testing Methodology to the MultiparadigmWorld of Commercial Spreadsheets , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[148]  Jácome Cunha,et al.  Embedding and evolution of spreadsheet models in spreadsheet systems , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[149]  Martin Erwig,et al.  A Type System Based on End-User Vocabulary , 2007, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007).

[150]  Elazar J. Pedhazur,et al.  Measurement, Design, and Analysis: An Integrated Approach , 1994 .

[151]  Yirsaw Ayalew,et al.  An end-user oriented graph-based visualization for spreadsheets , 2008, WEUSE@ICSE.

[152]  Shriram Krishnamurthi,et al.  A type system for statically detecting spreadsheet errors , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[153]  Margaret M. Burnett,et al.  Visually customizing inference rules about apples and oranges , 2002, Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments.

[154]  M. Erwig,et al.  Automatic generation and maintenance of correct spreadsheets , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[155]  Jorma Sajaniemi Modeling Spreadsheet Audit: A Rigorous Approach to Automatic Visualization , 2000, J. Vis. Lang. Comput..

[156]  Alan F. Blackwell,et al.  CHAPTER 5 – Notational Systems—The Cognitive Dimensions of Notations Framework , 2003 .

[157]  Dietmar Jannach,et al.  Toward an Integrated Framework for Declarative and Interactive Spreadsheet Debugging , 2013, ENASE.

[158]  Y. Chauhan,et al.  Growth in a Time of Debt , 2015 .

[159]  Martin Erwig,et al.  SheetDiff: A Tool for Identifying Changes in Spreadsheets , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.