An Entropy-based Approach to the Crowd Entity Resolution

Crowdsourcing is used to obtain needed ideas and content by soliciting data from a large group of people, especially from an online community. However, the data generated by a group of people is duplicated. As to learn the crowd intention based on the crowd data, we need to do some entity resolution works. Previous works focus on data matching and merging, but remain far from perfect in crowdsourcing area. In our study, we propose a generic way in measuring and representing the crowd intention based on the crowd data. The main contribution of our study is twofold: 1. We propose a graph structure that represents the crowd intention. 2. We propose an entropy-based measurement that evaluates the diversity of the crowd intention.

[1]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Matthias Dehmer,et al.  A history of graph entropy measures , 2011, Inf. Sci..

[3]  Mehrdad Sabetzadeh,et al.  View merging in the presence of incompleteness and inconsistency , 2006, Requirements Engineering.

[4]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[5]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[6]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[7]  Vldb Endowment,et al.  The VLDB journal : the international journal on very large data bases. , 1992 .

[8]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[9]  Mehrdad Sabetzadeh,et al.  Matching and Merging of Variant Feature Specifications , 2012, IEEE Transactions on Software Engineering.