Crowdsourced Entity Markup

Entities, such as people, places, products, etc., exist in knowledge bases and linked data, on one hand, and in web pages, news articles, and social media, on the other hand. Entity markup, like Named Entities Recognition and Disambiguation (NERD), is the essential means for adding semantic value to unstructured web contents and this way enabling the linkage between unstructured and structured data and knowledge collections. A major challenge in this endeavor lies in the dynamics of the digital contents about the world, with new entities emerging all the time. In this paper, we propose a crowdsourced framework for NERD, specifically addressing the challenge of emerging entities in social media. Our approach combines NERD techniques with the detection of entity alias names and with co-reference resolution in texts. We propose a linking-game based crowdsourcing system for this combined task, and we report on experimental insights with this approach and on lessons learned.

[1]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[2]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[3]  Danushka Bollegala,et al.  Automatically Extracting Personal Name Aliases from the Web , 2008, GoTAL.

[4]  Luis von Ahn Human computation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[5]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[6]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[7]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[8]  Vincent Ng,et al.  Coreference Resolution with World Knowledge , 2011, ACL.

[9]  Andrew McCallum,et al.  Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models , 2011, ACL.

[10]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[11]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[12]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[13]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[14]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[15]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[16]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[17]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[18]  Hugh Glaser,et al.  Linked Open Government Data: Lessons from Data.gov.uk , 2012, IEEE Intelligent Systems.

[19]  Jianyong Wang,et al.  Towards alias detection without string similarity: an active learning based approach , 2012, SIGIR '12.

[20]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[21]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[22]  Dan Roth,et al.  Learning-based Multi-Sieve Co-reference Resolution with Knowledge , 2012, EMNLP-CoNLL.

[23]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[24]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[25]  Gerhard Weikum,et al.  Knowledge harvesting in the big-data era , 2013, SIGMOD '13.

[26]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[27]  Eduard Hovy,et al.  Editorial: Artificial Intelligence, Wikipedia and Semi-Structured Resources , 2013 .

[28]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[29]  Gerhard Weikum,et al.  Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.