Topes

Programmers often omit input validation when inputs can appear in many different formats or when validation criteria cannot be precisely specified. To enable validation in these situations, we present a new technique that puts valid inputs into a consistent format and that identifies "questionable" inputs which might be valid or invalid, so that these values can be double-checked by a person or a program. Our technique relies on the concept of a "tope", which is an application-independent abstraction describing how to recognize and transform values in a category of data. We present our definition of topes and describe a development environment that supports the implementation and use of topes. Experiments with web application and spreadsheet data indicate that using our technique improves the accuracy and reusability of validation code and also improves the effectiveness of subsequent data cleaning such as duplicate identification.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  M. Shaw Larger scale systems require higher-level abstractions , 1989, IWSSD '89.

[3]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[4]  Andrew John Kennedy,et al.  Programming languages and dimensions , 1995 .

[5]  Eugene H. Spafford,et al.  Use of A Taxonomy of Security Faults , 1996 .

[6]  Maria Jean Johnstone Hall A risk and control-oriented study of the practices of spreadsheet application developers , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[7]  David Flanagan,et al.  JavaScript: The Definitive Guide , 1996 .

[8]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[9]  R. Adler,et al.  A practical guide to heavy tails: statistical techniques and applications , 1998 .

[10]  Bonnie A. Nardi,et al.  Collaborative, programmable intelligent agents , 1998, CACM.

[11]  Raymond R. Panko,et al.  What we know about spreadsheet errors , 1998 .

[12]  Rob Miller,et al.  Lightweight Structured Text Processing , 1999, USENIX Annual Technical Conference, General Track.

[13]  Gregg Rothermel,et al.  WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[14]  Mark A. McComb A Practical Guide to Heavy Tails , 2000, Technometrics.

[15]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[16]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[17]  Hersh Bhasin Microsoft ASP.NET professional projects , 2002 .

[18]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[19]  Mary Shaw,et al.  Semantic anomaly detection in online data sources , 2002, ICSE '02.

[20]  Benjamin C. Pierce,et al.  Types and programming languages / Benjamin C. Pierce , 2002 .

[21]  Margaret M. Burnett,et al.  Adding Apples and Oranges , 2002, PADL.

[22]  Gregg Rothermel,et al.  End-user software engineering with assertions in the spreadsheet paradigm , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[23]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[24]  Claus Brabrand,et al.  PowerForms: Declarative client-side form field validation , 2004, World Wide Web.

[25]  Jochen Rode,et al.  Web application development by nonprogrammers: user-centered design of an end-user web development tool , 2005 .

[26]  John Viega,et al.  19 Deadly Sins of Software Security , 2005 .

[27]  Mary Shaw,et al.  Estimating the numbers of end users and end user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[28]  Gregg Rothermel,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, ACM SIGSOFT Softw. Eng. Notes.

[29]  Brad A. Myers,et al.  Using objects of measurement to detect spreadsheet errors , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[30]  Mary Shaw,et al.  Games Programs Play: Obstacles to Data Reuse , 2006 .

[31]  Michael Hanus,et al.  Type-oriented construction of web user interfaces , 2006, PPDP '06.

[32]  Rinus Plasmeijer,et al.  The Implementation of iData - A Case Study in Generic Programming , 2006 .

[33]  Mary Shaw,et al.  Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites , 2006, PPIG.

[34]  C. H. Flood,et al.  The Fortress Language Specification , 2007 .

[35]  Christopher Scaffidi Unsupervised Inference of Data Formats in Human-Readable Notation , 2007, ICEIS.

[36]  Mary Shaw,et al.  The Topes Format Editor and Parser , 2007 .

[37]  Mary Shaw,et al.  Accommodating data heterogeneity in ULS systems , 2008, ULSSIS '08.

[38]  Mary Shaw,et al.  Toped: enabling end-user programmers to validate data , 2008, CHI Extended Abstracts.

[39]  Lotfi A. Zadeh,et al.  Fuzzy Logic , 2009, Encyclopedia of Complexity and Systems Science.