Leveraging Fine-Grained Wikipedia Categories for Entity Search

Ad-hoc entity search, which is to retrieve a ranked list of relevant entities in response to a query of natural language question, has been widely studied. It has been shown that category matching of entities, especially when matching to fine-grained entity types/categories, is critical to the performance of entity search. However, the potentials of the fine-grained Wikipedia entity categories, has not been well exploited by existing studies. Based on the observation of how people describe entities of a specific type, we propose a headword-and-modifier model to deeply interpret both queries and fine-grained entity types/categories. Probabilistic generative models are designed to effectively estimate the relevance of headwords and modifiers as a pattern-based matching problem, taking the Wikipedia type taxonomy as an important input to address the ad-hoc representations of concepts/entities in queries. Extensive experimental results on three widely-used test sets: INEX-XER 2009, SemSearch-LS and TREC-Entity, show that our method achieves a significant improvement of the entity search performance over the state-of-the-art methods.

[1]  Enrico Motta,et al.  Evaluating question answering over linked data , 2013, J. Web Semant..

[2]  Peter Mika,et al.  Entity Search Evaluation over Structured Web Data , 2011 .

[3]  Xiaoyong Du,et al.  Improving Context and Category Matching for Entity Search , 2014, AAAI.

[4]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[5]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[6]  Jaap Kamps,et al.  Exploiting the category structure of Wikipedia for entity ranking , 2013, Artif. Intell..

[7]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[8]  Gianluca Demartini,et al.  Overview of the INEX 2008 Entity Ranking Track , 2009, INEX.

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Paul Thomas,et al.  Overview of the TREC 2009 Entity Track , 2009, TREC.

[11]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[12]  Krisztian Balog,et al.  A test collection for entity search in DBpedia , 2013, SIGIR.

[13]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[14]  Ladislav Hluchý,et al.  The SemSets model for ad-hoc semantic list search , 2012, WWW.

[15]  Zhirui Hu,et al.  Head, modifier, and constraint detection in short texts , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Alexander Kotov,et al.  Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from Knowledge Graph , 2016, SIGIR.

[17]  Young-In Song,et al.  A novel retrieval approach reflecting variability of syntactic phrase representation , 2007, Journal of Intelligent Information Systems.

[18]  Saswati Mukherjee,et al.  A Recursive Approach to Entity Ranking and List Completion Using Entity Determining Terms, Qualifiers and Prominent n-Grams , 2009, INEX.

[19]  Behrang Mohit,et al.  Named Entity Recognition , 2014, NLP of Semitic Languages.

[20]  Krisztian Balog,et al.  Hierarchical target type identification for entity-oriented queries , 2012, CIKM.

[21]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[22]  Krisztian Balog,et al.  Target Type Identification for Entity-Bearing Queries , 2017, SIGIR.

[23]  Jaap Kamps,et al.  Overview of the INEX 2013 Linked Data Track , 2013, CLEF.

[24]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[25]  M. de Rijke,et al.  Example Based Entity Search in the Web of Data , 2013, ECIR.

[26]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[27]  M. de Rijke,et al.  Query modeling for entity search based on terms, categories, and examples , 2011, TOIS.

[28]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[29]  Jaap Kamps,et al.  Entity ranking using Wikipedia as a pivot , 2010, CIKM.

[30]  Krisztian Balog,et al.  On Type-Aware Entity Retrieval , 2017, ICTIR.

[31]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.