Provenance-driven Representation of Crowdsourcing Data for Efficient Data Analysis

Crowdsourcing has proved to be a feasible way of harnessing human computation for solving complex problems. However, crowdsourcing frequently faces various challenges: data handling, task reusability, and platform selection. Domain scientists rely on eScientists to find solutions for these challenges. CrowdTruth is a framework that builds on existing crowdsourcing platforms and provides an enhanced way to manage crowdsourcing tasks across platforms, offering solutions to commonly faced challenges. Provenance modeling proves means for documenting and examining scientific workflows. CrowdTruth keeps a provenance trace of the data flow through the framework, thus allowing to trace how data was transformed and by whom to reach its final state. In this way, eScientists have a tool to determine the impact that crowdsourcing has on enhancing their data.

[1]  Benno Stein,et al.  Paraphrase acquisition via crowdsourcing and machine learning , 2013, TIST.

[2]  Lora Aroyo,et al.  CrowdTruth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data , 2014, SEMWEB.

[3]  Beng Chin Ooi,et al.  A hybrid machine-crowdsourcing system for matching web tables , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[4]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[5]  Bertram Ludäscher,et al.  Provenance browser: Displaying and querying scientific workflow provenance graphs , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Tobias Hoßfeld,et al.  Analyzing costs and accuracy of validation mechanisms for crowdsourcing platforms , 2013, Math. Comput. Model..

[7]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[8]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[9]  Lora Aroyo,et al.  Crowdsourcing knowledge-intensive tasks in cultural heritage , 2014, WebSci '14.

[10]  Paulo Pinheiro,et al.  Probe-It! Visualization Support for Provenance , 2007, ISVC.

[11]  Jaime G. Carbonell,et al.  Active learning and crowdsourcing for machine translation in low resource scenarios , 2012 .

[12]  Margaret Hedstrom,et al.  Significant properties of digital objects: definitions, applications, implications (1) , 1995 .

[13]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[14]  Lora Aroyo,et al.  Personalization in crowd-driven annotation for cultural heritage collections , 2012, UMAP Workshops.

[15]  Markus Kunde,et al.  Requirements for a Provenance Visualization Component , 2008, IPAW.

[16]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[17]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[18]  Lora Aroyo,et al.  The Three Sides of CrowdTruth , 2014, Hum. Comput..

[19]  Matthew Lease,et al.  On Quality Control and Machine Learning in Crowdsourcing , 2011, Human Computation.

[20]  Luc Moreau,et al.  The Open Provenance Model: An Overview , 2008, IPAW.

[21]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[22]  Jane Yung-jen Hsu,et al.  Contextual Commonsense Knowledge Acquisition from Social Content by Crowd-Sourcing Explanations , 2012, HCOMP@AAAI.

[23]  Laks V. S. Lakshmanan,et al.  Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008, SIGMOD 2008.

[24]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..