Finding additional semantic entity information for search engines

Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.

[1]  Xian Zhang,et al.  Classifying What-Type Questions by Head Noun Tagging , 2008, COLING.

[2]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[3]  Ralf Krestel,et al.  Why finding entities in Wikipedia is difficult, sometimes , 2010, Information Retrieval.

[4]  Rayid Ghani,et al.  Text mining for product attribute extraction , 2006, SKDD.

[5]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[6]  Benjamin Van Durme,et al.  What You Seek Is What You Get: Extraction of Class Attributes from Query Logs , 2007, IJCAI.

[7]  Xiao Li,et al.  Understanding the Semantic Structure of Noun Phrase Queries , 2010, ACL.

[8]  Henning Rode,et al.  From Document to Entity Retrieval: Improving Precision and Performance of Focused Text Search , 2008 .

[9]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[10]  Katja Hofmann,et al.  The University of Amsterdam at TREC 2010: Session, Entity and Relevance Feedback , 2010, TREC.

[11]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[12]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[13]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[14]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[15]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[16]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[17]  M. de Rijke,et al.  Entity Retrieval , 2007 .

[18]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[19]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[20]  Michael Strube,et al.  Distinguishing between Instances and Classes in the Wikipedia Taxonomy , 2008, ESWC.

[21]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[22]  Abdulrahman Almuhareb,et al.  Attributes in lexical acquisition , 2006 .

[23]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[24]  Matthias Hartung,et al.  Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases , 2011, EMNLP.

[25]  Matthias Hartung,et al.  A Structured Vector Space Model for Hidden Attribute Meaning in Adjective-Noun Phrases , 2010, COLING.

[26]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.