Efficient algorithms for crowd-aided categorization

We study the problem of utilizing human intelligence to categorize a large number of objects. In this problem, given a category hierarchy and a set of objects, we can ask humans to check whether an object belongs to a category, and our goal is to find the most cost-effective strategy to locate the appropriate category in the hierarchy for each object, such that the cost (i.e., the number of questions to ask humans) is minimized. There are many important applications of this problem, including image classification and product categorization. We develop an online framework, in which category distribution is gradually learned and thus an effective order of questions are adaptively determined. We prove that even if the true category distribution is known in advance, the problem is computationally intractable. We develop an approximation algorithm, and prove that it achieves an approximation factor of 2. We also show that there is a fully polynomial time approximation scheme for the problem. Furthermore, we propose an online strategy which achieves nearly the same performance guarantee as the offline optimal strategy, even if there is no knowledge about category distribution beforehand. Experiments on a real crowdsourcing platform demonstrate the effectiveness of our method.

[1]  Marco Molinaro,et al.  On the complexity of searching in trees and partially ordered structures , 2011, Theor. Comput. Sci..

[2]  Jian Li,et al.  CDB: A Crowd-Powered Database System , 2018, Proc. VLDB Endow..

[3]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[4]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[5]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[6]  Nilesh N. Dalvi,et al.  Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..

[7]  Guoliang Li,et al.  DOCS: Domain-Aware Crowdsourcing System , 2016, Proc. VLDB Endow..

[8]  Lei Chen,et al.  Where To: Crowd-Aided Path Selection , 2014, Proc. VLDB Endow..

[9]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[10]  Haim Kaplan,et al.  Answering Planning Queries with the Crowd , 2013, Proc. VLDB Endow..

[11]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[12]  Liwei Wang,et al.  Deep Reinforcement Learning-Based Approach to Tackle Topic-Aware Influence Maximization , 2020, Data Science and Engineering.

[13]  Hongzhi Wang,et al.  Mining conditional functional dependency rules on big data , 2020, Big Data Min. Anal..

[14]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[15]  Hector Garcia-Molina,et al.  Question Selection for Crowd Entity Resolution , 2013, Proc. VLDB Endow..

[16]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[18]  Guoliang Li,et al.  Approximate Query Processing: What is New and Where to Go? , 2018, Data Science and Engineering.

[19]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[20]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[21]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[22]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[23]  David R. Karger,et al.  Counting with the Crowd , 2012, Proc. VLDB Endow..

[24]  Mausam,et al.  To Re(label), or Not To Re(label) , 2014, HCOMP.

[25]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Andreas Krause,et al.  Building Hierarchies of Concepts via Crowdsourcing , 2015, IJCAI.

[28]  Aditya G. Parameswaran,et al.  Finish Them!: Pricing Algorithms for Human Computation , 2014, Proc. VLDB Endow..

[29]  Sukhamay Kundu,et al.  A Linear Tree Partitioning Algorithm , 1977, SIAM J. Comput..

[30]  Yufei Tao,et al.  Interactive Graph Search , 2019, SIGMOD Conference.

[31]  Feng Xu,et al.  A Brief Review of Network Embedding , 2019, Big Data Min. Anal..

[32]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[33]  Aditya G. Parameswaran,et al.  Crowd-powered find algorithms , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[34]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[35]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[36]  Jennifer Widom,et al.  Deco: declarative crowdsourcing , 2012, CIKM.

[37]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[38]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[39]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[40]  Jennifer Widom,et al.  Optimal Crowd-Powered Rating and Filtering Algorithms , 2014, Proc. VLDB Endow..