Towards Expertise Modelling for Routing Data Cleaning Tasks within a Community of Knowledge Workers

Applications consuming data have to deal with variety of data quality issues such as missing values, duplication, incorrect values, etc. Although automatic approaches can be utilized for data cleaning the results can remain uncertain. Therefore updates suggested by automatic data cleaning algorithms require further human verification. This paper presents an approach for generating tasks for uncertain updates and routing these tasks to appropriate workers based on their expertise. Specifically the paper tackles the problem of modelling the expertise of knowledge workers for the purpose of routing tasks within collaborative data quality management. The proposed expertise model represents the profile of a worker against a set of concepts describing the data. A simple routing algorithm is employed for leveraging the expertise profiles for matching data cleaning tasks with workers. The proposed approach is evaluated on a real world dataset using human workers. The results demonstrate the effectiveness of using concepts for modelling expertise, in terms of likelihood of receiving responses to tasks routed to workers.

[1]  Edward Curry,et al.  XBRL and open data for global financial ecosystems: A linked data approach , 2012, Int. J. Account. Inf. Syst..

[2]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[3]  Edward Curry,et al.  The Role of Community-Driven Data Curation for Enterprises , 2010, Linking Enterprise Data.

[4]  AnHai Doan,et al.  Matching Schemas in Online Communities: A Web 2.0 Approach , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[6]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[7]  Robert Kern,et al.  Exploring the "Crowd" as enabler of better information quality , 2011, ICIQ.

[8]  Boris Otto,et al.  Organizing master data management: findings from an expert survey , 2010, SAC '10.

[9]  Boris Otto,et al.  One Size Does Not Fit All---A Contingency Approach to Data Governance , 2009, JDIQ.

[10]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[11]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[12]  Anders Haug,et al.  Barriers to master data quality , 2011, J. Enterp. Inf. Manag..

[13]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[14]  Antoine Isaac,et al.  SKOS Use Cases and Requirements , 2009 .

[15]  Edward Curry,et al.  Enterprise energy management using a linked dataspace for Energy Intelligence , 2012, 2012 Sustainable Internet and ICT for Sustainability (SustainIT).

[16]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[17]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[18]  Harri Haapasalo,et al.  Managing One Master Data - Challenges and Preconditions , 2010, Ind. Manag. Data Syst..

[19]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[20]  Norman W. Paton,et al.  Feedback-based annotation, selection and refinement of schema mappings for dataspaces , 2010, EDBT '10.

[21]  Edward Curry,et al.  Leveraging matching dependencies for guided user feedback in linked data applications , 2012, IIWeb '12.

[22]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[23]  M. de Rijke,et al.  Determining Expert Profiles (With an Application to Expert Finding) , 2007, IJCAI.

[24]  Praveen Paritosh,et al.  The anatomy of a large-scale human computation engine , 2010, HCOMP '10.

[25]  Alistair Miles,et al.  SKOS: Simple Knowledge Organisation for the Web , 2007 .