CrowdDB: answering queries with crowdsourcing

Some queries cannot be answered by machines only. Processing such queries requires human input for providing information that is missing from the database, for performing computationally difficult functions, and for matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. It uses SQL both as a language for posing complex queries and as a way to model data. While CrowdDB leverages many aspects of traditional database systems, there are also important differences. Conceptually, a major change is that the traditional closed-world assumption for query processing does not hold for human input. From an implementation perspective, human-oriented query operators are needed to solicit, integrate and cleanse crowdsourced data. Furthermore, performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location. We describe the design of CrowdDB, report on an initial set of experiments using Amazon Mechanical Turk, and outline important avenues for future work in the development of crowdsourced query processing systems.

[1]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[2]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[3]  Karen Ward,et al.  Dynamic query evaluation plans , 1989, SIGMOD '89.

[4]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[5]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[6]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[7]  Laurent Amsaleg,et al.  Scrambling query plans to cope with unexpected delays , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[9]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[10]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[11]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[12]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[13]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[14]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD 2000.

[15]  Mehul A. Shah,et al.  Adaptive Query Processing: Technology in Evolution , 2000, IEEE Data Eng. Bull..

[16]  Anneke Kleppe,et al.  MDA explained - the Model Driven Architecture: practice and promise , 2003, Addison Wesley object technology series.

[17]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Joseph M. Hellerstein,et al.  Improving data quality with dynamic forms , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[20]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[21]  David A. Patterson,et al.  PIQL: a performance insightful query language , 2010, SIGMOD Conference.

[22]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[23]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[24]  Vikas Kumar,et al.  CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[25]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[26]  AnHai Doan,et al.  Crowds, clouds, and algorithms: exploring the human side of "big data" applications , 2010, SIGMOD Conference.

[27]  M. Brian Blake,et al.  Programming Human and Software-Based Web Services , 2010, Computer.

[28]  Krzysztof Z. Gajos,et al.  Toward automatic task design: a progress report , 2010, HCOMP '10.

[29]  Joseph M. Hellerstein,et al.  Designing adaptive feedback for improving data entry accuracy , 2010, UIST.

[30]  Joseph M. Hellerstein,et al.  USHER: Improving data quality with dynamic forms , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[31]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[32]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[33]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[34]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.