DataSift: An Expressive and Accurate Crowd-Powered Search Toolkit

Traditional information retrieval systems have limited functionality. For instance, they are not able to adequately support queries containing non-textual fragments such as images or videos, queries that are very long or ambiguous, or semantically-rich queries over non-textual corpora. In this paper, we present DataSift, an expressive and accurate crowd-powered search toolkit that can connect to any corpus. We provide a number of alternative configurations for DataSift using crowdsourced and automated components, and demonstrate gains of 2–3x on precision over traditional retrieval schemes using experiments on real corpora. We also present our results on determining suitable values for parameters in those configurations, along with a number of interesting insights learned along the way.

[1]  Peng Dai,et al.  Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.

[2]  Meredith Ringel Morris,et al.  What do people ask their social networks, and why?: a survey study of status message q&a behavior , 2010, CHI.

[3]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[4]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[5]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[6]  Krzysztof Z. Gajos,et al.  Human computation tasks with global constraints , 2012, CHI.

[7]  Damon Horowitz,et al.  The anatomy of a large-scale social search engine , 2010, WWW '10.

[8]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[9]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[10]  Edith Law,et al.  Towards Large-Scale Collaborative Planning: Answering High-Level Search Queries Using Human Computation , 2011, AAAI.

[11]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[12]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[13]  Barry Smyth,et al.  Collaborative Web Search , 2009, IJCAI.

[14]  Meredith Ringel Morris,et al.  Collaborative Web Search: Who, What, Where, When, and Why , 2009, Collaborative Web Search: Who, What, Where, When, and Why.

[15]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..