Leveraging context in user-centric entity detection systems

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document. However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to "see" every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs. We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.

[1]  Milind S. Pandit,et al.  The selection recognition agent: instant access to relevant information and operations , 1997, IUI '97.

[2]  David D. Palmer,et al.  A Statistical Profile of the Named Entity Task , 1997, ANLP.

[3]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[4]  Joshua Goodman,et al.  Implicit Queries for Email , 2005, CEAS.

[5]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[6]  Shyam Kapur,et al.  Unity: relevance feedback using user query logs , 2006, SIGIR '06.

[7]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[8]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[9]  카퍼 샴,et al.  Systems and methods for generating concept units from search queries , 2004 .

[10]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[11]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[12]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Bonnie A. Nardi,et al.  Collaborative, programmable intelligent agents , 1998, CACM.

[15]  Hermann Ney,et al.  Maximum Entropy Models for Named Entity Recognition , 2003, CoNLL.

[16]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[17]  Susan T. Dumais,et al.  Implicit queries (IQ) for contextualized search , 2004, SIGIR '04.

[18]  Douglas E. Appelt,et al.  FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[19]  Peter Jackson,et al.  Natural Language Processing of Online Applications , 2002 .

[20]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[21]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..