uPick: Crowdsourcing Based Approach to Extract Relations Among Named Entities

Despite the advancement in the information extraction area, the task of identifying associated relations among named entities within a text document remains a significant challenge. Existing automated approaches lack human precision and they also struggle to handle erroneous documents. In this paper, we propose a crowdsourcing-based approach to improve the accuracy of the generated relations from the existing extraction techniques. Our idea is to gather judgments on the extracted relations of an article from the interested users. By contributing, the users in return remember the facts related to a document. This paper presents the complete design of the approach along with a user study done with twelve participants. Results show that the users rated the proposed system positively and were willing to contribute their time and energy for the task.

[1]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[2]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[5]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[6]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[7]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[8]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[9]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[10]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[11]  Caroline Brun,et al.  Semantically-Driven Extraction of Relations between Named Entities , 2009 .

[12]  Diego Mollá Aliod,et al.  Named Entity Recognition for Question Answering , 2006, ALTA.

[13]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[14]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[15]  S. Sekine Named Entity : History and Future , 2004 .

[16]  Brian A Vander Schee Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2009 .

[17]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[18]  James Davis,et al.  Evaluating and improving the usability of Mechanical Turk for low-income workers in India , 2010, ACM DEV '10.

[19]  Luis von Ahn Human Computation , 2008, ICDE.

[20]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[21]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[22]  Martin Hepp,et al.  Games with a Purpose for the Semantic Web , 2008, IEEE Intelligent Systems.

[23]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Savas Parastatidis,et al.  Automatic Discovery of Semantic Relations using MindNet , 2010, LREC.

[25]  Peter P. Chen English Sentence Structure and Entity-Relationship Diagrams , 1983, Inf. Sci..