Entity Ranking and Relationship Queries using an Extended Graph Model

There is a large amount of textual data on the Web and in Wikipedia, where mentions of entities (such as Gandhi) are annotated with a link to the disambiguated entity (such as M. K. Gandhi). Such annotation may have been done manually (as in Wikipedia) or can be done using named entity recognition/disambiguation techniques. Such an annotated corpus allows queries to return entities, instead of documents. Entity ranking queries retrieve entities that are related to keywords in the query and belong to a given type/category specified in the query; entity ranking has been an active area of research in the past few years. More recently, there have been extensions to allow entity-relationship queries, which allow specification of multiple sets of entities as well as relationships between them. In this paper we address the problem of entity ranking ("near") queries and entity-relationship queries on theWikipedia corpus. We first present an extended graph model which combines the power of graph models used earlier for structured/semi-structured data, with information from textual data. Based on this model, we show how to specify entity and entity-relationship queries, and defined scoring methods for ranking answers. Finally, we provide efficient algorithms for answering such queries, exploiting a space efficient in-memory graph structure. A performance comparison with the ERQ system proposed earlier shows significant improvement in answer quality for most queries, while also handling a much larger set of entity types.

[1]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[2]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[3]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[4]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[5]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[6]  Soumen Chakrabarti,et al.  Optimizing scoring functions and indexes for proximity search in type-annotated corpora , 2006, WWW '06.

[7]  Xiaonan Li,et al.  Entity-Relationship Queries over Wikipedia , 2012, TIST.

[8]  Cong Yu,et al.  EntityEngine: answering entity-relationship queries using shallow semantics , 2010, CIKM '10.

[9]  Ganesh Ramakrishnan,et al.  Web-scale entity-relation search architecture , 2011, WWW.

[10]  Ihab F. Ilyas,et al.  Expressive and flexible access to web-extracted data: a keyword-based structured query language , 2010, SIGMOD Conference.

[11]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[12]  Kevin Chen-Chuan Chang,et al.  Beyond pages: supporting efficient, scalable entity search with dual-inversion index , 2010, EDBT '10.

[13]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[16]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Fabian M. Suchanek,et al.  ESTER: efficient search on text, entities, and relations , 2007, SIGIR.

[18]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[19]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.