T-Crowd: Effective Crowdsourcing for Tabular Data

We study the effective use of crowdsourcing in filling missing values in a given relation (e.g., a table containing different attributes of celebrity stars, such as nationality and age). A task given to a worker typically consists of questions about the missing attribute values (e.g., what is the age of Jet Li?). Existing work often treats related attributes independently, leading to suboptimal performance. We present T-Crowd: a crowdsourcing system that considers attribute relationships. T-Crowd integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values. Our solution seamlessly supports categorical and continuous attributes. Our experiments on real datasets show that T-Crowd outperforms state-of-the-art methods, improving the quality of truth inference.

[1]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[2]  Yannis Papakonstantinou,et al.  Waldo: An Adaptive Human Interface for Crowd Entity Resolution , 2017, SIGMOD Conference.

[3]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[4]  AnHai Doan,et al.  Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services , 2017, SIGMOD Conference.

[5]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[6]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[7]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[8]  Sihem Amer-Yahia,et al.  Task Assignment Optimization in Collaborative Crowdsourcing , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[10]  Bo Zhao,et al.  Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[12]  Masashi Sugiyama,et al.  Bandit-Based Task Assignment for Heterogeneous Crowdsourcing , 2015, Neural Computation.

[13]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[14]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[15]  Xi Chen,et al.  Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.

[16]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[17]  Hector Garcia-Molina,et al.  CrowdDQS: Dynamic Question Selection in Crowdsourcing Systems , 2017, SIGMOD Conference.

[18]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[19]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[20]  Ohad Greenshpan,et al.  Asking the Right Questions in Crowd Data Sourcing , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  David Gross-Amblard,et al.  Using Hierarchical Skills for Optimized Task Assignment in Knowledge-Intensive Crowdsourcing , 2016, WWW.

[24]  Anja Gruenheid,et al.  Crowdsourcing Entity Resolution: When is A=B? , 2012 .

[25]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[26]  Chu-Song Chen,et al.  Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval , 2014, ECCV.

[27]  Lilly Irani,et al.  Amazon Mechanical Turk , 2018, Advances in Intelligent Systems and Computing.

[28]  Jiawei Han,et al.  Debiasing Crowdsourced Batches , 2015, KDD.

[29]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[30]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[31]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[34]  J. V. Michalowicz,et al.  Handbook of Differential Entropy , 2013 .

[35]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[36]  Reynold Cheng,et al.  DOCS: a domain-aware crowdsourcing system using knowledge bases , 2016, VLDB 2016.

[37]  Jennifer Widom,et al.  CrowdFill: collecting structured data from the crowd , 2014, SIGMOD Conference.

[38]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[39]  Sihem Amer-Yahia,et al.  Task assignment optimization in knowledge-intensive crowdsourcing , 2015, The VLDB Journal.

[40]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..