Analyzing and Visualizing Spreadsheets

Spreadsheets are used extensively in industry: they are the number one tool for financial analysis and are also prevalent in other domains, such as logistics and planning. Their flexibility and immediate feedback make them easy to use for non-programmers. But as easy as spreadsheets are to build, so difficult can they be to analyze and adapt. This dissertation aims at developing methods to support spreadsheet users to understand, update and improve spreadsheets. We took our inspiration for such methods from software engineering, as this field is specialized in the analysis of data and calculations. In this dissertation, we have looked at four different aspects of spreadsheets: metadata, structure, formulas and data. We found that methods from software engineering can be applied to spreadsheets very well, and that these methods support end-users in working with spreadsheets.

[1]  R. Abraham,et al.  How to communicate unit error messages in spreadsheets , 2005, WEUSE@ICSE.

[2]  Gregor Engels,et al.  ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications , 2005, ASE '05.

[3]  Hugo Ribeiro,et al.  Towards a Catalog of Spreadsheet Smells , 2012, ICCSA.

[4]  M G W H Van De Rijdt,et al.  Two-dimensional Pattern Matching , 2005 .

[5]  Donald E. Knuth,et al.  Computer-drawn flowcharts , 1963, CACM.

[6]  Qiang Tu,et al.  Tracking structural evolution using origin analysis , 2002, IWPSE '02.

[7]  Stephen G. Powell,et al.  Errors in Operational Spreadsheets: A Review of the State of the Art , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[8]  Pedro Hugo do Nascimento Gabriel,et al.  Software languages engineering: experimental evaluation , 2010 .

[9]  Herman H. Goldstine,et al.  Planning and coding of problems for an Electronic Computing Instrument , 1947 .

[10]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[11]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Gregg Rothermel,et al.  End-user software engineering with assertions in the spreadsheet paradigm , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[13]  Chanchal Kumar Roy,et al.  Detection and analysis of near-miss software clones , 2009, 2009 IEEE International Conference on Software Maintenance.

[14]  Paul H. Cheney,et al.  Organizational Factors Affecting the Success of End-User Computing , 1986, J. Manag. Inf. Syst..

[15]  Robert Slater,et al.  Portraits in silicon , 1987 .

[16]  Eelco Visser,et al.  WebDSL: a domain-specific language for dynamic web applications , 2008, OOPSLA Companion.

[17]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[18]  Andrej Bregar Complexity Metrics for Spreadsheet Models , 2008, ArXiv.

[19]  Vladimir I. Levenshtein,et al.  On the Minimal Redundancy of Binary Error-Correcting Codes , 1975, Inf. Control..

[20]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[21]  Yutaka Matsushita,et al.  3D interactive visualization for inter-cell dependencies of spreadsheets , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[22]  John F. Raffensperger New Guidelines For Spreadsheets , 2008, ArXiv.

[23]  Tom De Marco,et al.  Structured Analysis And System Specification , 2015 .

[24]  Patrick O'Beirne Information and Data Quality in Spreadsheets , 2008, ArXiv.

[25]  Martin Erwig,et al.  Inferring templates from spreadsheets , 2006, ICSE '06.

[26]  Gregg Rothermel,et al.  Scaling up a "What you see is what you test" methodology to spreadsheet grids , 1999, Proceedings 1999 IEEE Symposium on Visual Languages.

[27]  C. Marcel,et al.  Mattesich (R.) - Simulation of the firm through a budget computer program , 1966 .

[28]  Douglas Bell,et al.  Spreadsheets: a research agenda , 1993, SIGP.

[29]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[30]  J. Zeman,et al.  quantitative evaluation of by , 2010 .

[31]  Michael W. Godfrey,et al.  Detecting merging and splitting using origin analysis , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[32]  Arie van Deursen,et al.  Supporting professional spreadsheet users by generating leveled dataflow diagrams , 2010, 2011 33rd International Conference on Software Engineering (ICSE).

[33]  Bonnie A. Nardi,et al.  The spreadsheet interface: A basis for end user programming , 1990, IFIP TC13 International Conference on Human-Computer Interaction.

[34]  M. Lynne Markus,et al.  If we build it, they will come: Designing information systems that people want to use , 1994 .

[35]  Felienne Hermans Exact and Near-miss Clone Detection in Spreadsheets , 2012, Tiny Trans. Comput. Sci..

[36]  Margaret M. Burnett,et al.  Adding Apples and Oranges , 2002, PADL.

[37]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[38]  Gregg Rothermel,et al.  WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[39]  Tiago L. Alves,et al.  Deriving metric thresholds from benchmark data , 2010, 2010 IEEE International Conference on Software Maintenance.

[40]  Arie van Deursen,et al.  Detecting code smells in spreadsheet formulas , 2011, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[41]  A. McCallin,et al.  Designing a grounded theory study: some practicalities. , 2003, Nursing in critical care.

[42]  Tibor Bakota,et al.  Tracking the Evolution of Code Clones , 2011, SOFSEM.

[43]  Henry C. Lucas,et al.  Spreadsheet analysis and design , 1989, CACM.

[44]  Brian Knight,et al.  A Structured Methodology for Spreadsheet Modelling , 2008, ArXiv.

[45]  Jácome Cunha,et al.  An Empirical Study on End-users Productivity Using Model-based Spreadsheets , 2011, ArXiv.

[46]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[47]  Joline Morrison,et al.  Using a structured design approach to reduce risks in end user spreadsheet development , 2000, Inf. Manag..

[48]  Martin Erwig Software Engineering for Spreadsheets , 2009, IEEE Software.

[49]  Arie van Deursen,et al.  Detecting and visualizing inter-worksheet smells in spreadsheets , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[50]  James R. Cordy,et al.  Models are code too: Near-miss clone detection for Simulink models , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[51]  Azriel Rosenfeld Array Grammars , 1986, Graph-Grammars and Their Application to Computer Science.

[52]  S. E. Kruck,et al.  Testing spreadsheet accuracy theory , 2006, Inf. Softw. Technol..

[53]  Cliff T. Ragsdale,et al.  Modeling Optimization Problems in the Unstructured World of Spreadsheets , 1997 .

[54]  Philippe Kruchten,et al.  A methodological leg to stand on: lessons learned using grounded theory to study software development , 2008, CASCON '08.

[55]  M. Marshall Sampling for qualitative research. , 1996, Family practice.

[56]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[57]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[58]  Tadao Takaoka,et al.  A technique for two-dimensional pattern matching , 1989, CACM.

[59]  Eleni Stroulia,et al.  A study on the current state of the art in tool-supported UML-based static reverse engineering , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[60]  Chanchal Kumar Roy,et al.  Near-miss function clones in open source software : an empirical study , 2009 .

[61]  Raymond R. Panko,et al.  Individual and group spreadsheet design: patterns of errors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[62]  J.Steve Davis Tools for spreadsheet auditing , 1996, Int. J. Hum. Comput. Stud..

[63]  Antonio Restivo,et al.  Two-Dimensional Finite State Recognizability , 1996, Fundam. Informaticae.

[64]  Mary Shaw,et al.  Estimating the numbers of end users and end user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[65]  Markus Clermont,et al.  A Scalable Approach to Spreadsheet Visualization , 2003 .

[66]  Gregg Rothermel,et al.  Testing strategies for form-based visual programs , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[67]  David G. Hendry,et al.  Creating, comprehending and explaining spreadsheets: a cognitive interpretation of what discretionary users think of the spreadsheet model , 1994, Int. J. Hum. Comput. Stud..

[68]  Arie van Deursen,et al.  Data clone detection and visualization in spreadsheets , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[69]  Jácome Cunha,et al.  Discovery-based edit assistance for spreadsheets , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[70]  Martin Erwig,et al.  Visual specifications of correct spreadsheets , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[71]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[72]  Gregg Rothermel,et al.  Automated test case generation for spreadsheets , 2002, ICSE '02.

[73]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[74]  Patrick O'Beirne Spreadsheet Refactoring , 2010, ArXiv.

[75]  Roland Mittermeir,et al.  Metrics-Based Spreadsheet Visualization: Support for Focused Maintenance , 2008, ArXiv.

[76]  Roland Mittermeir,et al.  Detecting Errors in Spreadsheets , 2008, ArXiv.

[77]  Jácome Cunha,et al.  From spreadsheets to relational databases and back , 2009, PEPM '09.

[78]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[79]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[80]  Michael Chatfield,et al.  The history of accounting : an international encyclopedia , 1996 .

[81]  Martin Erwig,et al.  Automatic detection of dimension errors in spreadsheets , 2009, J. Vis. Lang. Comput..

[82]  Rani Siromoney,et al.  Abstract families of matrices and picture languages , 1972, Comput. Graph. Image Process..

[83]  Raymond R. Panko,et al.  Spreadsheets on trial: a survey of research on spreadsheet risks , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[84]  Raymond R. Panko,et al.  The Detection of Human Spreadsheet Errors by Humans versus Inspection (Auditing) Software , 2010, ArXiv.

[85]  Jorma Sajaniemi Modeling Spreadsheet Audit: A Rigorous Approach to Automatic Visualization , 2000, J. Vis. Lang. Comput..

[86]  Miryung Kim,et al.  Using a clone genealogy extractor for understanding and supporting evolution of code clones , 2005, MSR.

[87]  Roland Mittermeir,et al.  Finding high-level structures in spreadsheet programs , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[88]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[89]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[90]  Martin Erwig,et al.  A Type System Based on End-User Vocabulary , 2007, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007).

[91]  Shriram Krishnamurthi,et al.  A type system for statically detecting spreadsheet errors , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[92]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[93]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[94]  David A. Scanlan Structured flowcharts outperform pseudocode: an experimental comparison , 1989, IEEE Software.

[95]  Roy S. Freedman Introduction to Financial Technology , 2006 .

[96]  Markus Clermont Analyzing large spreadsheet programs , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[97]  Arie van Deursen,et al.  Automatically Extracting Class Diagrams from Spreadsheets , 2010, ECOOP.

[98]  M. Fisher,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, WEUSE@ICSE.

[99]  Raymond R. Panko,et al.  What we know about spreadsheet errors , 1998 .

[100]  Danny Dig,et al.  Refactoring meets spreadsheet formulas , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[101]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[102]  Brian Knight,et al.  Quality control in spreadsheets: a software engineering-based approach to spreadsheet development , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[103]  Daniela Cruzes,et al.  The evolution and impact of code smells: A case study of two open source systems , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[104]  Kevin McDaid,et al.  Using Bayesian statistical methods to determine the level of error in large spreadsheets. , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[105]  Abbas Tashakkori,et al.  Mixed Methodology: Combining Qualitative and Quantitative Approaches , 1998 .

[106]  Duncan McPhee,et al.  Mining Spreadsheet Complexity Data to Classify End User Developers , 2009, DMIN.

[107]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[108]  Chris Gane,et al.  Structured Systems Analysis: Tools and Techniques , 1977 .

[109]  Theodore P. Baker A Technique for Extending Rapid Exact-Match String Matching to Arrays of More Than One Dimension , 1978, SIAM J. Comput..

[110]  Martin Erwig,et al.  Mutation Operators for Spreadsheets , 2009, IEEE Transactions on Software Engineering.

[111]  Donald P. Ballou,et al.  Implications of data quality for spreadsheet analysis , 1987, DATB.

[112]  Martin Erwig,et al.  Header and Unit Inference for Spreadsheets Through Spatial Analyses , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[113]  Radu Marinescu,et al.  Detecting design flaws via metrics in object-oriented systems , 2001, Proceedings 39th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems. TOOLS 39.

[114]  Gregg Rothermel,et al.  What you see is what you test: a methodology for testing form-based visual programs , 1998, Proceedings of the 20th International Conference on Software Engineering.