Sharing, finding and reusing end-user code for reformatting and validating data

To help users with automatically reformatting and validating spreadsheets and other datasets, prior work introduced a user-extensible data model called ''topes'' and a supporting visual programming language. However, no support has existed to date for users to exchange and reuse topes. This functional gap results in wasteful duplication of work as users implement topes that other people have already created. In this paper, a design for a new repository system is presented that supports sharing and finding of topes for reuse. This repository tightly integrates traditional keyword-based search with two additional search methods whose usefulness in repositories of end-user code has gone unexplored to date. The first method is ''search-by-match'', where a user specifies examples of data, and the repository retrieves topes that can reformat and validate that data. The second method is collaborative filtering, which has played a vital role in repositories of non-code artifacts. The repository's search functionality was empirically tested on a prototype repository implementation by simulating queries generated from real user spreadsheets. This experiment reveals that search-by-match and collaborative filtering greatly improve the accuracy of search over the traditional keyword-based approach, to a recall as high as 95%. These results show that search-by-match and collaborative filtering are useful approaches for helping users to publish, find, and reuse visual programs similar to topes.

[1]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[2]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[3]  Gerry Stahl,et al.  Internet repositories for collaborative learning: supporting both students and teachers , 1995, CSCL.

[4]  James T. Kwok,et al.  Mining customer product ratings for personalized marketing , 2003, Decis. Support Syst..

[5]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[6]  David Garlan,et al.  Lightweight structure in text , 2002 .

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[9]  Margaret M. Burnett,et al.  Supporting reuse of evolving visual code , 1997, Proceedings. 1997 IEEE Symposium on Visual Languages (Cat. No.97TB100180).

[10]  Mitchel Resnick,et al.  Empowering kids to create and share programmable media , 2008, Interactions.

[11]  Rajeev Rastogi,et al.  RE-tree: an efficient index structure for regular expressions , 2003, The VLDB Journal.

[12]  J. Mixter Fast , 2012 .

[13]  Pang-Ning Tan,et al.  Receiver Operating Characteristic , 2009, Encyclopedia of Database Systems.

[14]  M. Fisher,et al.  The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms , 2005, WEUSE@ICSE.

[15]  Mary Shaw,et al.  Fast, Accurate Creation of Data Validation Formats by End-User Developers , 2009, IS-EUD.

[16]  Mary Shaw,et al.  Intelligently creating and recommending reusable reformatting rules , 2009, IUI.

[17]  Robert P. Nix,et al.  Editing by example , 1984 .

[18]  Bonnie A. Nardi,et al.  Collaborative, programmable intelligent agents , 1998, CACM.

[19]  Mary Shaw,et al.  Topes , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Alan F. Blackwell,et al.  SWYN: a visual representation for regular expressions , 2001 .

[21]  Christopher Scaffidi Unsupervised Inference of Data Formats in Human-Readable Notation , 2007, ICEIS.

[22]  Dunja Mladenic,et al.  kNN Versus SVM in the Collaborative Filtering Framework , 2006, Data Science and Classification.

[23]  Margaret M. Burnett,et al.  End-user programming in the wild: A field study of CoScripter scripts , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[24]  Glenn Fung,et al.  Incremental Support Vector Machine Classification , 2002, SDM.

[25]  David R. Karger,et al.  Potluck: Data mash-up tool for casual users , 2008, J. Web Semant..

[26]  GarofalakisMinos,et al.  RE-tree: an efficient index structure for regular expressions , 2003, VLDB 2003.

[27]  Bonnie A. Nardi,et al.  A Small Matter of Programming: Perspectives on End User Computing , 1993 .

[28]  Henry Lieberman,et al.  Training Agents to Recognize Text by Example , 1999, AGENTS '99.