CrowdLink: Crowdsourcing for Large-Scale Linked Data Management

Crowd sourcing is an emerging paradigm to exploit the notion of human-computation for solving various computational problems, which cannot be accurately solved solely by the machine-based solutions. We use crowd sourcing for large-scale link management in the Semantic Web. More specifically, we develop Crowd Link, which utilizes crowd workers for verification and creation of triples in Linking Open Data (LOD). LOD incorporates the core data sets in the Semantic Web, yet is not in full conformance with the guidelines for publishing high quality linked data on the Web. Our approach can help in enriching and improving quality of mission-critical links in LOD. Scalable LOD link management requires a hybrid approach, where human intelligent and machine intelligent tasks interleave in a workflow execution. Likewise, many other crowd sourcing applications require a sophisticated workflow specification not only on human intelligent tasks, but also machine intelligent tasks to handle data and control-flow, which is strictly deficient in the existing crowd sourcing platforms. Hence, we are strongly motivated to investigate the interplay of crowd sourcing, and semantically enriched workflows for better human-machine cooperation in task completion. We demonstrate usefulness of our approach through various link creation and verification tasks, and workflows using Amazon Mechanical Turk. Experimental evaluation demonstrates promising results in terms of accuracy of the links created, and verified by the crowd workers.

[1]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[2]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[3]  Jürgen Umbrich,et al.  Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora , 2012, J. Web Semant..

[4]  Amit P. Sheth,et al.  Linked Data Is Merely More Data , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[5]  Joseph G. Davis,et al.  Ontological Services Using Crowdsourcing , 2010 .

[6]  Maribel Acosta,et al.  A Semantically Enabled Architecture for Crowdsourced Linked Data Management , 2012, CrowdSearch.

[7]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[8]  Phuoc Tran-Gia,et al.  Anatomy of a Crowdsourcing Platform - Using the Example of Microworkers.com , 2011, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[9]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[10]  Hector Garcia-Molina,et al.  Turkalytics: analytics for human computation , 2011, WWW.

[11]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[12]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[13]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[14]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[15]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[16]  Elena Paslaru Bontas Simperl,et al.  CrowdMap: Crowdsourcing Ontology Alignment with Microtasks , 2012, SEMWEB.

[17]  Deborah L. McGuinness,et al.  owl:sameAs and Linked Data: An Empirical Study , 2010 .

[18]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[19]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[20]  Michael Hausenblas,et al.  Describing Linked Datasets , 2009, LDOW.

[21]  Kwong-Sak Leung,et al.  Task Matching in Crowdsourcing , 2011, 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing.

[22]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[23]  Anja Gruenheid,et al.  Crowdsourcing Entity Resolution: When is A=B? , 2012 .

[24]  Omar Alonso,et al.  Perspectives on Infrastructure for Crowdsourcing , 2011 .

[25]  Joseph G. Davis From Crowdsourcing to Crowdservicing , 2011, IEEE Internet Computing.

[26]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..