Building Mashups by Demonstration

The latest generation of WWW tools and services enables Web users to generate applications that combine content from multiple sources. This type of Web application is referred to as a mashup. Many of the tools for constructing mashups rely on a widget paradigm, where users must select, customize, and connect widgets to build the desired application. While this approach does not require programming, the users must still understand programming concepts to successfully create a mashup. As a result, they are put off by the time, effort, and expertise needed to build a mashup. In this article, we describe our programming-by-demonstration approach to building mashup by example. Instead of requiring a user to select and customize a set of widgets, the user simply demonstrates the integration task by example. Our approach addresses the problems of extracting data from Web sources, cleaning and modeling the extracted data, and integrating the data across sources. We implemented these ideas in a system called Karma, and evaluated Karma on a set of 23 users. The results show that, compared to other mashup construction tools, Karma allows more of the users to successfully build mashups and makes it possible to build these mashups significantly faster compared to using a widget-based approach.

[1]  Jeffrey Wong,et al.  Making mashups with marmite: towards end-user programming for the web , 2007, CHI.

[2]  Pedro M. Domingos,et al.  Learning Source Description for Data Integration , 2000, WebDB.

[3]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[4]  Pedro M. Domingos,et al.  Learning Source Descriptions for Data Integration , 2000 .

[5]  Kristina Lerman,et al.  Using the structure of Web sites for automatic segmentation of tables , 2004, SIGMOD '04.

[6]  Eric Bouillet,et al.  Wishful search: interactive composition of data mashups , 2008, WWW.

[7]  Bodo Rieger,et al.  Semantic Integration of Heterogeneous Information Sources , 2000, EFIS.

[8]  Margaret M. Burnett,et al.  A Classification System for Visual Programming Languages , 1994, J. Vis. Lang. Comput..

[9]  Robert F. Woolson,et al.  Rank tests for censored matched pairs , 1980 .

[10]  Moshé M. Zloof Query-by-example: the invocation and definition of tables and forms , 1975, VLDB '75.

[11]  Oren Etzioni,et al.  Category Translation: Learning to Understand Information on the Internet , 1995, IJCAI.

[12]  Chris Clifton,et al.  Database Integration Using Neural Networks: Implementation and Experiences , 2000, Knowledge and Information Systems.

[13]  David Gay,et al.  User-friendly functional programming for web mashups , 2007, ICFP '07.

[14]  Fan Yang,et al.  WYSIWYG development of data driven web applications , 2008, Proc. VLDB Endow..

[15]  Craig A. Knoblock,et al.  Interactive Data Integration through Smart Copy & Paste , 2009, CIDR.

[16]  Atsushi Sugiura,et al.  Internet scrapbook: automating Web browsing tasks by demonstration , 1998, UIST '98.

[17]  Oren Etzioni,et al.  Statistical Methods for Analyzing Speedup Learning Experiments , 1994, Machine Learning.

[18]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[19]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[20]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[21]  W. R. Sutherland,et al.  The on-line graphical specification of computer procedures , 1966 .

[22]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[23]  Xiaojin Zhu,et al.  Building Community Wikipedias: A Machine-Human Partnership Approach , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[25]  Michael Kifer,et al.  HILOG: A Foundation for Higher-Order Logic Programming , 1993, J. Log. Program..

[26]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[27]  Jennifer Widom,et al.  Lineage tracing in data warehouses , 2001 .

[28]  Serge Abiteboul,et al.  Tools for Data Translation and Integration , 1999, IEEE Data Eng. Bull..

[29]  Tessa A. Lau,et al.  Sheepdog: learning procedures for technical support , 2004, IUI '04.

[30]  Craig A. Knoblock,et al.  Agent wizard: building information agents by answering questions , 2004, IUI '04.

[31]  David W. Embley,et al.  Using Domain Ontologies to Discover Direct and Indirect Matches for Schema Elements , 2003 .

[32]  Christian S. Jensen,et al.  Google fusion tables: web-centered data management and collaboration , 2010, SIGMOD Conference.

[33]  David R. Karger,et al.  Potluck: Data mash-up tool for casual users , 2008, J. Web Semant..

[34]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[35]  Craig A. Knoblock,et al.  An Automatic Approach to Semantic Annotation of Unstructured, Ungrammatical Sources: A First Look , 2007 .

[36]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[37]  David Salesin,et al.  Relations, cards, and search templates: user-guided web data integration and layout , 2007, UIST.

[38]  Scott R. Klemmer,et al.  Programming by a sample: rapidly creating web applications with d.mix , 2007, UIST.

[39]  Christian S. Jensen,et al.  Google fusion tables: data management, integration and collaboration in the cloud , 2010, SoCC '10.

[40]  Craig A. Knoblock,et al.  Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[41]  Craig A. Knoblock,et al.  Building data integration queries by demonstration , 2007, IUI '07.

[42]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.

[43]  David R. Karger,et al.  Piggy Bank: Experience the Semantic Web inside your web browser , 2005, J. Web Semant..

[44]  K. Pearson Biometrika , 1902, The American Naturalist.

[45]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[46]  M. Hills,et al.  The two-period cross-over clinical trial. , 1979, British journal of clinical pharmacology.

[47]  Alexander Russell,et al.  A Critical Look at Experimental Evaluations of EBL , 1991, Machine Learning.

[48]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[49]  Nathanael Chambers,et al.  PLOW: A Collaborative Task Learning Agent , 2007, AAAI.

[50]  Oren Etzioni,et al.  Statistical methods for analyzing speedup learning experiments , 2004, Machine Learning.

[51]  Yolanda Gil,et al.  User Studies of Knowledge Acquisition Tools : Methodology and Lessons Learned , 2022 .

[52]  Divesh Srivastava,et al.  SPIDER: flexible matching in databases , 2005, SIGMOD '05.

[53]  Pedro M. Domingos,et al.  Programming by demonstration: a machine learning approach , 2001 .

[54]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[55]  Henry Lieberman,et al.  Watch what I do: programming by demonstration , 1993 .

[56]  Craig A. Knoblock,et al.  Building Mashups by example , 2008, IUI '08.

[57]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[58]  Wolfgang Gatterbauer,et al.  Towards domain-independent information extraction from web tables , 2007, WWW '07.

[59]  Carole A. Goble,et al.  The Data Playground: An Intuitive Workflow Specification Environment , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).