CrowdSky: Skyline Computation with Crowdsourcing

In this paper, we propose a crowdsourcing-based approach to solving skyline queries with incomplete data. Our main idea is to leverage crowds to infer the pair-wise preferences between tuples when the values of tuples in some attributes are unknown. Specifically, our proposed solution considers three key factors used in existing crowd-enabled algorithms: (1) minimizing a monetary cost in identifying a crowdsourced skyline by using a dominating set, (2) reducing the number of rounds for latency by parallelizing the questions asked to crowds, and (3) improving the accuracy of a crowdsourced skyline by dynamically assigning the number of crowd workers per question. We evaluate our solution over both simulated and real crowdsourcing using the Amazon Mechanical Turk. Compared to a sort-based baseline method, our solution significantly minimizes the monetary cost, and reduces the number of rounds up to two orders of magnitude. In addition, our dynamic majority voting method shows higher accuracy than both static majority voting method and the existing solution using unary questions.

[1]  Ohad Greenshpan,et al.  Asking the Right Questions in Crowd Data Sourcing , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Atsuyuki Morishima,et al.  CyLog/Crowd4U: A Declarative Platform for Complex Data-centric Crowdsourcing , 2012, Proc. VLDB Endow..

[3]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[4]  Neoklis Polyzotis,et al.  Human-Powered Top-k Lists , 2013, WebDB.

[5]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[6]  Sibo Wang,et al.  Crowd-Based Deduplication: An Adaptive Approach , 2015, SIGMOD Conference.

[7]  Wolf-Tilo Balke,et al.  Skyline queries in crowd-enabled databases , 2013, EDBT '13.

[8]  Clifford Stein,et al.  Introduction to Algorithms -3/Ed. , 2012 .

[9]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[10]  David R. Karger,et al.  Demonstration of Qurk: a query processor for humanoperators , 2011, SIGMOD '11.

[11]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[12]  Lei Zou,et al.  Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jennifer Widom,et al.  Deco: declarative crowdsourcing , 2012, CIKM.

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[15]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[17]  Aditya G. Parameswaran,et al.  So who won?: dynamic max discovery with the crowd , 2012, SIGMOD Conference.

[18]  Sanjeev Khanna,et al.  Using the crowd for top-k and group-by queries , 2013, ICDT '13.

[19]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[20]  Jeffrey F. Naughton,et al.  Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.

[21]  Purnamrita Sarkar,et al.  Crowdsourced enumeration queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[22]  Seung-won Hwang,et al.  BSkyTree: scalable skyline computation using a balanced pivot selection , 2010, EDBT '10.

[23]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[24]  Hector Garcia-Molina,et al.  tDP: An Optimal-Latency Budget Allocation Strategy for Crowdsourced MAXIMUM Operations , 2015, SIGMOD Conference.

[25]  Anja Gruenheid,et al.  Crowdsourcing Entity Resolution: When is A=B? , 2012 .

[26]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[27]  Gang Chen,et al.  An online cost sensitive decision-making method in crowdsourcing systems , 2013, SIGMOD '13.

[28]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[29]  Reynold Cheng,et al.  Optimizing plurality for human intelligence tasks , 2013, CIKM.

[30]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.