Semantically Enriched Task and Workflow Automation in Crowdsourcing for Linked Data Management

Crowdsourcing is one of the new emerging paradigms to exploit the notion of human-computation for harvesting and processing complex heterogenous data to produce insight and actionable knowledge. Crowdsourcing is task-oriented, and hence specification and management of not only tasks, but also workflows should play a critical role. Crowdsourcing research can still be considered in its infancy. Significant need is felt for crowdsourcing applications to be equipped with well defined task and workflow specifications ranging from simple human-intelligent tasks to more sophisticated and cooperative tasks to handle data and control-flow among these tasks. Addressing this need, we have attempted to devise a generic, flexible and extensible task specification and workflow management mechanism in crowdsourcing. We have contextualized this problem to linked data management as our domain of interest. More specifically, we develop CrowdLink, which utilizes an architecture for automated task specification, generation, publishing and reviewing to engage crowdworkers for verification and creation of triples in the Linked Open Data (LOD) cloud. The LOD incorporates various core data sets in the semantic web, yet is not in full conformance with the guidelines for publishing high quality linked data on the web. Our approach is not only useful in efficiently processing the LOD management tasks, it can also help in enriching and improving quality of mission-critical links in the LOD. We demonstrate usefulness of our approach through various link creation and verification tasks, and workflows using Amazon Mechanical Turk. Experimental evaluation demonstrates promising results not only in terms of ease of task generation, publishing and reviewing, but also in terms of accuracy of the links created, and verified by the crowdworkers.

[1]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[2]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[3]  Hiroyuki Kitagawa,et al.  Algorithms for structure-based grouping in XML-OLAP , 2009, Int. J. Web Inf. Syst..

[4]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.