SGUARD: A Feature-Based Clustering Tool for Effective Spreadsheet Defect Detection

Spreadsheets are widely used but subject to various defects. In this paper, we present SGUARD to effectively detect spreadsheet defects. SGUARD learns spreadsheet features to cluster cells with similar computational semantics, and then refines these clusters to recognize anomalous cells as defects. SGUARD well balances the trade-off between the precision (87.8%) and recall rate (71.9%) in the defect detection, and achieves an F-measure of 0.79, exceeding existing spreadsheet defect detection techniques. We introduce the SGUARD implementation and its usage by a video presentation (https://youtu.be/gNPmMvQVf5Q), and provide its public download repository (https://github.com/sheetguard/sguard).

[1]  Chang Xu,et al.  CACheck: Detecting and Repairing Cell Arrays in Spreadsheets , 2017, IEEE Transactions on Software Engineering.

[2]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[3]  M. Fisher,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, WEUSE@ICSE.

[4]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[5]  Raymond R. Panko,et al.  Revising the Panko-Halverson taxonomy of spreadsheet errors , 2008, Decis. Support Syst..

[6]  Roland Mittermeir,et al.  Detecting Errors in Spreadsheets , 2008, ArXiv.

[7]  Emery D. Berger,et al.  ExceLint: automatically finding spreadsheet formula errors , 2018, Proc. ACM Program. Lang..

[8]  Jun Wei,et al.  Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation , 2014, ICSE.

[9]  Stephen G. Powell,et al.  A comparison of spreadsheet users with different levels of experience , 2009 .

[10]  Stephen G. Powell,et al.  A critical review of the literature on spreadsheet errors , 2008, Decis. Support Syst..

[11]  Martin Erwig,et al.  UCheck: A spreadsheet type checker for end users , 2007, J. Vis. Lang. Comput..

[12]  Kenneth N. Berk,et al.  Data Analysis With Microsoft Excel , 2000 .

[13]  Jie Wang,et al.  SpreadCluster: Recovering Versioned Spreadsheets through Similarity-Based Clustering , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[14]  Wanjun Chen,et al.  CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[15]  Jian Lu,et al.  How effectively can spreadsheet anomalies be detected: An empirical study , 2017, J. Syst. Softw..

[16]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[17]  Nikolaus Augsten,et al.  RTED: A Robust Algorithm for the Tree Edit Distance , 2011, Proc. VLDB Endow..

[18]  Martin Erwig,et al.  Automatic detection of dimension errors in spreadsheets , 2009, J. Vis. Lang. Comput..

[19]  Xiaoxing Ma,et al.  WARDER: Refining Cell Clustering for Effective Spreadsheet Defect Detection via Validity Properties , 2019, 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS).

[20]  Martin Erwig,et al.  Inferring templates from spreadsheets , 2006, ICSE '06.

[21]  Jun Wei,et al.  Detecting table clones and smells in spreadsheets , 2016, SIGSOFT FSE.

[22]  Raymond R. Panko,et al.  Revising the Panko-Halverson Taxonomy of Spreadsheet Risks , 2009, 2009 42nd Hawaii International Conference on System Sciences.