Automatic detection of dimension errors in spreadsheets

We present a reasoning system for inferring dimension information in spreadsheets. This system can be used to check the consistency of spreadsheet formulas and thus is able to detect errors in spreadsheets. Our approach is based on three static analysis components. First, the spatial structure of the spreadsheet is analyzed to infer a labeling relationship among cells. Second, cells that are used as labels are lexically analyzed and mapped to potential dimensions. Finally, dimension information is propagated through spreadsheet formulas. An important aspect of the rule system defining dimension inference is that it works bi-directionally, that is, not only ''downstream'' from referenced arguments to the current cell, but also ''upstream'' in the reverse direction. This flexibility makes the system robust and turns out to be particularly useful in cases when the initial dimension information that can be inferred from headers is incomplete or ambiguous. We have implemented a prototype system as an add-in to Excel. In an evaluation of this implementation we were able to detect dimension errors in almost 50% of the investigated spreadsheets, which shows (i) that the system works reliably in practice and (ii) that dimension information can be well exploited to uncover errors in spreadsheets.

[1]  S. Ditlea,et al.  Spreadsheets can be hazardous to your health , 1987 .

[2]  Gregg Rothermel,et al.  A methodology for testing spreadsheets , 2001, TSEM.

[3]  Martin Erwig,et al.  A Type System Based on End-User Vocabulary , 2007, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007).

[4]  Shriram Krishnamurthi,et al.  A type system for statically detecting spreadsheet errors , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[5]  D. Isbell,et al.  Mars Climate Orbiter Team Finds Likely Cause Of Loss , 2007 .

[6]  Martin Erwig,et al.  Visual specifications of correct spreadsheets , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[7]  Gregor Engels,et al.  ClassSheets - model-based, object-oriented design of spreadsheet applications , 2007, J. Object Technol..

[8]  Mary Shaw,et al.  Estimating the numbers of end users and end user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[9]  Margaret M. Burnett,et al.  Adding Apples and Oranges , 2002, PADL.

[11]  Martin Erwig,et al.  Dimension inference in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[12]  Gregor Engels,et al.  ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications , 2005, ASE '05.

[13]  Margaret M. Burnett,et al.  Garbage in, garbage out? An empirical look at oracle mistakes by end-user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[14]  Henry C. Lucas,et al.  Toward a logical/physical theory of spreadsheet modeling , 1992, TOIS.

[15]  Elaine J. Weyuker,et al.  Comparison of program testing strategies , 1991, TAV4.

[16]  Brad A. Myers,et al.  Using objects of measurement to detect spreadsheet errors , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[17]  Gregg Rothermel,et al.  Testing Homogeneous Spreadsheet Grids with the "What You See Is What You Test" Methodology , 2002, IEEE Trans. Software Eng..

[18]  Henry C. Lucas,et al.  Spreadsheet analysis and design , 1989, CACM.

[19]  Martin Erwig,et al.  Gencel: a program generator for correct spreadsheets , 2006, J. Funct. Program..

[20]  Brian Knight,et al.  Quality control in spreadsheets: a software engineering-based approach to spreadsheet development , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[21]  Stephen G. Powell,et al.  The Art of Modeling with Spreadsheets , 2003 .

[22]  Martin Erwig,et al.  Test-driven goal-directed debugging in spreadsheets , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[23]  Martin Erwig,et al.  GoalDebug: A Spreadsheet Debugger for End Users , 2007, 29th International Conference on Software Engineering (ICSE'07).

[24]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[25]  Jorma Sajaniemi Modeling Spreadsheet Audit: A Rigorous Approach to Automatic Visualization , 2000, J. Vis. Lang. Comput..

[26]  Gregg Rothermel,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, ACM SIGSOFT Softw. Eng. Notes.

[27]  Martin Erwig,et al.  AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[28]  David Tufte A New and Different Spreadsheet Text The Art of Modeling with Spreadsheets: Management Science, Spreadsheet Engineering and Modeling Craft by Stephen G. Powell and Kenneth R. Baker , 2004 .

[29]  Martin Erwig,et al.  Mutation Operators for Spreadsheets , 2009, IEEE Transactions on Software Engineering.

[30]  Martin Erwig,et al.  Header and Unit Inference for Spreadsheets Through Spatial Analyses , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[31]  Margaret M. Burnett,et al.  Sharing reasoning about faults in spreadsheets: An empirical study , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[32]  Alan G. Yoder,et al.  Real spreadsheets for real programmers , 1994, Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL'94).

[33]  Raymond R. Panko,et al.  Applying Code Inspection to Spreadsheet Testing , 1999, J. Manag. Inf. Syst..

[34]  Hewijin Christine Jiau,et al.  Comments on "On Object Systems and Behavior Inheritance" , 2003, IEEE Trans. Software Eng..

[35]  Gregg Rothermel,et al.  Automated test case generation for spreadsheets , 2002, ICSE '02.

[36]  Martin Erwig,et al.  Automatic generation and maintenance of correct spreadsheets , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[37]  Roland Mittermeir,et al.  Finding high-level structures in spreadsheet programs , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[38]  Matthias Felleisen,et al.  Validating the unit correctness of spreadsheet programs , 2004, Proceedings. 26th International Conference on Software Engineering.

[39]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..