Deco: declarative crowdsourcing

Crowdsourcing enables programmers to incorporate "human computation" as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Similarly, humans can be used as a building block in data-intensive applications--providing, comparing, and verifying data used by applications. Building upon the decades-long success of declarative approaches to conventional data management, we use a similar approach for data-intensive applications that incorporate humans. Specifically, declarative queries are posed over stored relational data as well as data computed on-demand from the crowd, and the underlying system orchestrates the computation of query answers. We present Deco, a database system for declarative crowdsourcing. We describe Deco's data model, query language, and our prototype. Deco's data model was designed to be general (it can be instantiated to other proposed models), flexible (it allows methods for data cleansing and external access to be plugged in), and principled (it has a precisely-defined semantics). Syntactically, Deco's query language is a simple extension to SQL. Based on Deco's data model, we define a precise semantics for arbitrary queries involving both stored data and data obtained from the crowd. We then describe the Deco query processor which uses a novel push-pull hybrid execution model to respect the Deco semantics while coping with the unique combination of latency, monetary cost, and uncertainty introduced in the crowdsourcing environment. Finally, we experimentally explore the query processing alternatives provided by Deco using our current prototype.

[1]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[2]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[3]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[4]  Jeffrey F. Naughton,et al.  Efficiently incorporating user feedback into information extraction and integration programs , 2009, SIGMOD Conference.

[5]  Hector Garcia-Molina,et al.  Turkalytics: analytics for human computation , 2011, WWW.

[6]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[7]  SuciuDan,et al.  Query optimization in the presence of limited access patterns , 1999 .

[8]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[9]  Inderpal Singh Mumick,et al.  Deriving Production Rules For Incremental View Maintenance , 1999 .

[10]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[11]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[12]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[13]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[14]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD '00.

[15]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[16]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[17]  David R. Karger,et al.  Demonstration of Qurk: a query processor for humanoperators , 2011, SIGMOD '11.

[18]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[19]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[20]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[21]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[22]  Jennifer Widom,et al.  Query Processing over Crowdsourced Data , 2012 .

[23]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[24]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[25]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[26]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Edward Y. Chang,et al.  Query planning with limited source capabilities , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[28]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[29]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  AnHai Doan,et al.  Matching Schemas in Online Communities: A Web 2.0 Approach , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Aditya G. Parameswaran,et al.  So who won?: dynamic max discovery with the crowd , 2012, SIGMOD Conference.

[32]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[33]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[34]  Tim Kraska,et al.  CrowdDB: Query Processing with the VLDB Crowd , 2011, Proc. VLDB Endow..