Asking the Right Questions in Crowd Data Sourcing

Crowd-based data sourcing is a new and powerful data procurement paradigm that engages Web users to collectively contribute information. In this work, we target the problem of gathering data from the crowd in an economical and principled fashion. We present Ask It!, a system that allows interactive data sourcing applications to effectively determine which questions should be directed to which users for reducing the uncertainty about the collected data. Ask It! uses a set of novel algorithms for minimizing the number of probing (questions) required from the different users. We demonstrate the challenge and our solution in the context of a multiple-choice question game played by the ICDE'12 attendees, targeted to gather information on the conference's publications, authors and colleagues.

[1]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[2]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[3]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[4]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[5]  Haim Kaplan,et al.  Improved recommendations via (more) collaboration , 2010, WebDB '10.

[6]  James Bennett,et al.  The Netflix Prize , 2007 .

[7]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[8]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[9]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[10]  Jennifer Widom,et al.  Panda: A System for Provenance and Data , 2010, IEEE Data Eng. Bull..

[11]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[13]  Daniel Deutch,et al.  Using Markov Chain Monte Carlo to play Trivia , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[14]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..