Crowdsourcing with Multiple-Source Knowledge Transfer

Crowdsourcing is a new computing paradigm that harnesses human effort to solve computer-hard problems. Budget and quality are two fundamental factors in crowdsourcing, but they are antagonistic and their balance is crucially important. Induction and inference are principled ways for humans to acquire knowledge. Transfer learning can also enable induction and inference processes. When a new task comes, we may not know how to go about approaching it. On the other hand, we may have easy access to relevant knowledge that can help us with the new task. As such, via appropriate knowledge transfer, for example, an improved annotation can be achieved for the task at a small cost. To make this idea concrete, we introduce the Crowdsourcing with Multiple-source Knowledge Transfer (CrowdMKT) approach to transfer knowledge from multiple, similar, but different domains for a new task, and to reduce the negative impact of irrelevant sources. CrowdMKT first learns a set of concentrated highlevel feature vectors of tasks using knowledge transfer from multiple sources, and then introduces a probabilistic graphical model to jointly model the tasks with high-level features, workers, and their annotations. Finally, it adopts an EM algorithm to estimate the workers’ strengths and consensus. Experimental results on real-world image and text datasets prove the effectiveness of CrowdMKT in improving the quality and reducing the budget.

[1]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[2]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[4]  Jun Wang,et al.  Multi-label crowd consensus via joint matrix factorization , 2019, Knowledge and Information Systems.

[5]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[6]  Xiangliang Zhang,et al.  Attention-Aware Answers of the Crowd , 2019, SDM.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Reynold Cheng,et al.  DOCS: a domain-aware crowdsourcing system using knowledge bases , 2016, VLDB 2016.

[9]  David Gross-Amblard,et al.  Using Hierarchical Skills for Optimized Task Assignment in Knowledge-Intensive Crowdsourcing , 2016, WWW.

[10]  Gang Chen,et al.  An online cost sensitive decision-making method in crowdsourcing systems , 2013, SIGMOD '13.

[11]  George Kesidis,et al.  Multicategory Crowdsourcing Accounting for Variable Task Difficulty, Worker Skill, and Worker Intention , 2015, IEEE Transactions on Knowledge and Data Engineering.

[12]  Guoliang Li,et al.  Crowdsourced Data Management: Overview and Challenges , 2017, SIGMOD Conference.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Wilfred Ng,et al.  A transfer learning based framework of crowd-selection on twitter , 2013, KDD.

[17]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[18]  Dacheng Tao,et al.  Active Learning for Crowdsourcing Using Knowledge Transfer , 2014, AAAI.

[19]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[20]  Victor S. Sheng,et al.  Consensus algorithms for biased labeling in crowdsourcing , 2017, Inf. Sci..

[21]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[22]  R. Fletcher Practical Methods of Optimization , 1988 .

[23]  Jie Zhang,et al.  A Novel Strategy for Active Task Assignment in Crowd Labeling , 2018, IJCAI.

[24]  Xiangliang Zhang,et al.  Active Multi-Label Crowd Consensus , 2019, ArXiv.

[25]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .