TRank: Ranking Entity Types Using the Web of Data

Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.

[1]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[2]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[3]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[4]  Luo Si,et al.  Purdue at TREC 2010 Entity Track: A Probabilistic Framework for Matching Types Between Candidate and Target Entities , 2010, TREC.

[5]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[6]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[7]  Jeff Heflin,et al.  The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[8]  Hector Garcia-Molina,et al.  Joint entity resolution on multiple datasets , 2013, The VLDB Journal.

[9]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[10]  Gerhard Weikum,et al.  Einstein: physicist or vegetarian? summarizing semantic type graphs for knowledge discovery , 2011, WWW.

[11]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[12]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[13]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[14]  Gianluca Demartini,et al.  Combining inverted indices and structured search for ad-hoc object retrieval , 2012, SIGIR '12.

[15]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[16]  Christian Bizer,et al.  Web Data Commons - Extracting Structured Data from Two Large Web Corpora , 2012, LDOW.

[17]  Geoff Holmes,et al.  Generating Rule Sets from Model Trees , 1999, Australian Joint Conference on Artificial Intelligence.

[18]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[19]  Aditya Kalyanpur,et al.  Leveraging Community-Built Knowledge for Type Coercion in Question Answering , 2011, International Semantic Web Conference.

[20]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[21]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[22]  Aditya Kalyanpur,et al.  A Comparison of Hard Filters and Soft Evidence for Answer Typing in Watson , 2012, International Semantic Web Conference.

[23]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[24]  Olga Štěpánková,et al.  Advanced Topics in Artificial Intelligence , 1992, Lecture Notes in Computer Science.

[25]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[26]  Andrea Giovanni Nuzzolese,et al.  Automatic Typing of DBpedia Entities , 2012, SEMWEB.

[27]  Ravi Kumar,et al.  A Characterization of Online Search Behavior , 2009, IEEE Data Eng. Bull..

[28]  Krisztian Balog,et al.  The Sindice-2011 Dataset for Entity-Oriented Search in the Web of Data , 2011 .

[29]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[30]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.