Judgment analysis of crowdsourced opinions using biclustering

The problem of deriving final judgment from crowdsourced opinions is addressed with an unsupervised approach.Biclustering is shown to be useful for identifying the annotators crucial for a judgment.We establish that a suitable fraction of the entire dataset is sufficient for appropriate judgment analysis.As the proposed method does not work over the entire data, it becomes useful for big data analysis. Annotation by the crowd workers serving online is gaining focus in recent years in diverse fields due to its distributed power of problem solving. Distributing the labeling task among a large set of workers (may be experts or non-experts) and obtaining the final consensus is a popular way of performing large-scale annotation in a limited time. Collection of multiple annotations can be effective for annotation of large-scale datasets for applications like natural language processing, image processing, etc. However, as the crowd workers are not necessarily experts, their opinions might not be accurate enough. This causes problem in deriving the final aggregated judgment. Again, majority voting (MV) is not suitable for such problems because the number of annotators is limited and they have multiple options to choose. This might cause too much conflicts among the opinions provided. Additionally, there might exist annotators who randomly try to annotate (provide spam opinions for) too many questions to maximize their payment. This can incorporate noise while deriving the final judgment. In this paper, we address the problem of crowd judgment analysis in an unsupervised way and a biclustering-based approach is proposed to obtain the judgments appropriately. The effectiveness of this approach is demonstrated on four publicly available small-scale Amazon Mechanical Turk datasets, along with a large-scale CrowdFlower dataset. We also compare the algorithm with MV and some other existing algorithms. In most of the cases the proposed approach is competitively better than others. But most importantly, it does not use the entire dataset for deriving the judgment.

[1]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[2]  Chao Liu,et al.  TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings , 2012, ICML.

[3]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[4]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[5]  Luis von Ahn Human Computation , 2008, ICDE.

[6]  Mausam,et al.  Crowdsourcing Control: Moving Beyond Multiple Choice , 2012, UAI.

[7]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[8]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[9]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[10]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[11]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Matthew Lease,et al.  Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.

[13]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[14]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[15]  W. Batchelder,et al.  Markov chain estimation for test theory without an answer key , 2003 .

[16]  Dmitry I. Ignatov,et al.  Recommendation of Ideas and Antagonists for Crowdsourcing Platform Witology , 2014, RuSSIR.

[17]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[18]  Karim R. Lakhani,et al.  Incentives and Problem Uncertainty in Innovation Contests: An Empirical Analysis , 2011, Manag. Sci..

[19]  John C. Tang,et al.  Reflecting on the DARPA Red Balloon Challenge , 2011, Commun. ACM.

[20]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[21]  Faris Alqadah,et al.  Biclustering neighborhood-based collaborative filtering method for top-n recommender systems , 2015, Knowledge and Information Systems.

[22]  Simon Rogers,et al.  Semi-parametric analysis of multi-rater data , 2010, Stat. Comput..

[23]  De Ayala,et al.  The Theory and Practice of Item Response Theory , 2008 .

[24]  Praveen Paritosh,et al.  Human Computation Must Be Reproducible , 2012, CrowdSearch.

[25]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[26]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[27]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[28]  Gabriella Kazai,et al.  Bayesian Combination of Crowd-Based Tweet Sentiment Analysis Judgments , 2013 .

[29]  Amedeo Napoli,et al.  Biclustering meets triadic concept analysis , 2013, Annals of Mathematics and Artificial Intelligence.

[30]  Adam E. Wyse R.J. DE AYALA (2009) The Theory and Practice of Item Response Theory. , 2010 .

[31]  Jonas Poelmans,et al.  FCA-Based Models and a Prototype Data Analysis System for Crowdsourcing Platforms , 2013, ICCS.

[32]  Panagiotis Symeonidis,et al.  Nearest-biclusters collaborative filtering based on constant and coherent values , 2008, Information Retrieval.

[33]  Matthew Lease,et al.  On Quality Control and Machine Learning in Crowdsourcing , 2011, Human Computation.

[34]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[35]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[36]  Malay Bhattacharyya,et al.  A biclustering approach for crowd judgment analysis , 2015, CoDS '15.

[37]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[38]  Valen E. Johnson,et al.  On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading , 1996 .

[39]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[40]  Matthew Lease,et al.  Improving Consensus Accuracy via Z-Score and Weighted Voting , 2011, Human Computation.

[41]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[42]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[43]  Jonas Poelmans,et al.  Concept-Based Biclustering for Internet Advertisement , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.