CONSENSUS-BASED CROWDSOURCING: TECHNIQUES AND APPLICATIONS

Crowdsourcing solutions are receiving more and more attention in the recent literature about social computing and distributed problem solving. In general terms, crowdsourcing can be considered as a social-computing model aimed at fostering the autonomous formation and emergence of the so-called wisdom of the crowd. Quality assessment is a crucial issue for the effectiveness of crowdsourcing systems, both for what concerns task and worker management. Another aspect to be considered in crowdsourcing systems is about the kind of contributions workers can make. Usually, crowdsourcing approaches rely only on tasks where workers have to decide among a predefined set of possible solutions. On the other hand, tasks leaving the workers a higher level of freedom in producing their answer (e.g., free-hand drawing) are more difficult to be managed and verified. In the Thesis, we present the LiquidCrowd approach based on consensus and trustworthiness techniques for managing the execution of collaborative tasks. By collaborative task, we refer to a task for which a factual answer is not possible/appropriate, or a task whose result depends on the personal perception/point-of-view of the worker. We introduce the notion of worker trustworthiness to denote the worker “reliability”, namely her/his capability to foster the successful completion of tasks. Furthermore, we improve the conventional score-based mechanism by introducing the notion of award that is a bonus provided to those workers that contribute to reach the consensus within groups. This way, groups with certain trustworthiness requirements can be composed on-demand, to deal with complex tasks, like for example tasks where consensus has not been reached during the first execution. In LiquidCrowd, we define a democratic mechanism based on the notion of supermajority to enable the flexible specification of the expected degree of agreement required for obtaining the consensus within a worker group. In LiquidCrowd, three task typologies are provided: choice, where the worker is asked to choose the answer among a list of predefined options; range, where the worker is asked to provide a free-numeric answer; proposition, where the worker is asked to provide a free text answer. To evaluate the quality of the produced results obtained through LiquidCrowd consensus techniques, we perform a testing against the SQUARE crowdsourcing benchmark. Furthermore, to evaluate the capability of LiquidCrowd to effectively support a real problem, real case studies about web data classification have been selected.

[1]  Jennifer Widom,et al.  Optimal Crowd-Powered Rating and Filtering Algorithms , 2014, Proc. VLDB Endow..

[2]  Silvana Castano,et al.  Dimensional Clustering of Linked Data: Techniques and Applications , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[3]  Schahram Dustdar,et al.  Incentives and rewarding in social computing , 2013, CACM.

[4]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[5]  Elena Paslaru Bontas Simperl,et al.  CrowdMap: Crowdsourcing Ontology Alignment with Microtasks , 2012, SEMWEB.

[6]  Scott R. Klemmer,et al.  Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.

[7]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[8]  Thanh Tran,et al.  Heterogeneous web data search using relevance-based on the fly data integration , 2012, WWW.

[9]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[10]  Andrew McGregor,et al.  AutoMan: a platform for integrating human-based and digital computation , 2012, OOPSLA '12.

[11]  Fernando González-Ladrón-de-Guevara,et al.  Towards an integrated crowdsourcing definition , 2012, J. Inf. Sci..

[12]  John Arthur,et al.  Spring framework for rapid open source J2EE Web application development: a case study , 2005, Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Network.

[13]  Licia Capra,et al.  Quality control for real-time ubiquitous crowdsourcing , 2011, UbiCrowd '11.

[14]  Palle Andersen,et al.  Modal Identification from Ambient Responses using Frequency Domain Decomposition , 2000 .

[15]  Alfio Ferrara,et al.  Similarity Recognition in the Web of Data , 2014, EDBT/ICDT Workshops.

[16]  Silvana Castano,et al.  Combining crowd consensus and user trustworthiness for managing collective tasks , 2016, Future Gener. Comput. Syst..

[17]  Alfio Ferrara,et al.  Linked data classification: a feature-based approach , 2013, EDBT '13.

[18]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[19]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[20]  Matthew Reid,et al.  Quality control mechanisms for crowdsourcing: peer review, arbitration, & expertise at familysearch indexing , 2013, CSCW '13.

[21]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[24]  AnHai Doan,et al.  Matching Schemas in Online Communities: A Web 2.0 Approach , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Silvana Castano,et al.  Urban information integration through smart city views , 2014, Int. J. Knowl. Learn..

[26]  Jennifer Preece,et al.  Dynamic changes in motivation in collaborative citizen-science projects , 2012, CSCW.

[27]  Jon A. Krosnick,et al.  Designing Rating Scales for Effective Measurement in Surveys , 1997 .

[28]  Silvana Castano,et al.  Leveraging crowdsourced knowledge for web data clouds empowerment , 2013, IEEE 7th International Conference on Research Challenges in Information Science (RCIS).

[29]  Sihem Amer-Yahia,et al.  Crowds, not Drones: Modeling Human Factors in Interactive Crowdsourcing , 2013, DBCrowd.

[30]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[31]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[32]  Leysia Palen,et al.  Learning from the crowd: Collaborative filtering techniques for identifying on-the-ground Twitterers during mass disruptions , 2012, ISCRAM.

[33]  Rob Miller,et al.  Generating annotations for how-to videos using crowdsourcing , 2013, CHI Extended Abstracts.

[34]  Michael Rosemann,et al.  Crowdsourcing Information Systems - A Systems Theory Perspective , 2011 .

[35]  Silvana Castano,et al.  Thematic Clustering and Exploration of Linked Data , 2012, SeCO Book.

[36]  Yijun Yu,et al.  Mining java class naming conventions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[37]  Silvana Castano,et al.  Ontology and Instance Matching , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[38]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[39]  Daren C. Brabham MOVING THE CROWD AT THREADLESS , 2010 .

[40]  Derek Greene,et al.  Using Crowdsourcing and Active Learning to Track Sentiment in Online Media , 2010, ECAI.

[41]  Grace Hui Yang,et al.  Collecting high quality overlapping labels at low cost , 2010, SIGIR.

[42]  Silvana Castano,et al.  Structured data clouding across multiple webs , 2012, Inf. Syst..

[43]  Denny Vrandecic,et al.  Crowdsourcing tasks in linked data management , 2011 .

[44]  Abraham Bernstein,et al.  CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems , 2012, SocInfo.

[45]  Matthew Lease,et al.  Semi-Supervised Consensus Labeling for Crowdsourcing , 2011 .

[46]  Silvana Castano,et al.  Clouding Services for Linked Data Exploration , 2012, CAiSE.

[47]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[48]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[49]  Aditya G. Parameswaran,et al.  Evaluating the crowd with confidence , 2013, KDD.

[50]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[51]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[52]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[53]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[54]  Christian Bauer,et al.  Java Persistence with Hibernate , 2006 .

[55]  Abhimanu Kumar Modeling Annotator Accuracies for Supervised Learning , 2011 .

[56]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[57]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[58]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[59]  Alessandro Bozzon,et al.  Reactive crowdsourcing , 2013, WWW.

[60]  G. Marchionini From Finding to Understanding , 2006 .

[61]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[62]  Hector Garcia-Molina,et al.  Question Selection for Crowd Entity Resolution , 2013, Proc. VLDB Endow..

[63]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[64]  C. Buckley,et al.  Overview of the TREC 2010 Relevance Feedback Track ( Notebook ) , 2010 .

[65]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[66]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[67]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[68]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[69]  Guoliang Li,et al.  Incremental Quality Inference in Crowdsourcing , 2014, DASFAA.

[70]  K. Lakhani,et al.  Threadless: The Business of Community , 2008 .

[71]  Naoki Ohta,et al.  Toward a Crowdsourcing Platform for Knowledge Base Construction , 2014 .

[72]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[73]  Valentina Presutti,et al.  Ontology Naming Pattern Sauce for (Human and Computer) Gourmets , 2009, WOP.

[74]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[75]  Alessandro Bozzon,et al.  Liquid query: multi-domain exploratory search on the web , 2010, WWW '10.

[76]  Matthew Lease,et al.  Improving Consensus Accuracy via Z-Score and Weighted Voting , 2011, Human Computation.

[77]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[78]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[79]  Shamkant B. Navathe,et al.  Conceptual Database Design: An Entity-Relationship Approach , 1991 .