Skyline in Crowdsourcing with Imprecise Comparisons

Given an input of a set of objects each one represented as a vector of features in a feature space, the problem of finding the skyline is the problem of determining the subset of objects that are not dominated by any other input object. An example of an application is to find the best hotel(s) with respect to some features (location, price, cleanliness, etc.) The use of the crowd for solving this problem is useful when a score of items according to their features is not available. Yet the crowd can give inconsistent answers. In this paper we study the computation of the skyline when the comparisons between objects are performed by humans. We model the problem using the threshold model [Ajtai et al, TALG 2015] in which the comparison of two objects may create errors/inconsistencies if the objects are close to each other. We provide algorithms for the problem and we analyze the required number of human comparisons and lower bounds. We also evaluate the effectiveness and efficiency of our algorithms using synthetic and real-world data.

[1]  Stavros Papadopoulos,et al.  Topologically Sorted Skylines for Partially Ordered Domains , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[3]  Jan Chomicki,et al.  Preference formulas in relational queries , 2003, TODS.

[4]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[5]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Claire Mathieu,et al.  Skyline Computation with Noisy Comparisons , 2020, IWOCA.

[8]  Avinatan Hassidim,et al.  Sorting and Selection with Imprecise Comparisons , 2009, ICALP.

[9]  Tova Milo,et al.  Skyline Queries with Noisy Comparisons , 2015, PODS.

[10]  Luca Becchetti,et al.  The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing , 2015, SIGMOD Conference.

[11]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[12]  Wolf-Tilo Balke,et al.  Skyline queries in crowd-enabled databases , 2013, EDBT '13.

[13]  Abolfazl Asudeh,et al.  Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons , 2014, CIKM.

[14]  Aditya G. Parameswaran,et al.  Challenges in Data Crowdsourcing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  L. Thurstone A law of comparative judgment. , 1994 .

[16]  Ashwin Lall,et al.  Randomized Multi-pass Streaming Skyline Algorithms , 2009, Proc. VLDB Endow..