The role of documents vs. queries in extracting class attributes from text

Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The differences are quantified as part of a large-scale study on extracting prominent attributes or quantifiable properties of classes (e.g., top speed, price and fuel consumption for CarModel) from unstructured text. In a head-to-head qualitative comparison, a lightweight extraction method produces class attributes that are 45% more accurate on average, when acquired from query logs rather than Web documents.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  Yang Zhang,et al.  Exploring Distributional Similarity Based Models for Query Spelling Correction , 2006, ACL.

[3]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[4]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[5]  Benjamin Van Durme,et al.  What You Seek Is What You Get: Extraction of Class Attributes from Query Logs , 2007, IJCAI.

[6]  Gang Wang,et al.  Extracting Key Semantic Terms from Chinese Speech Query for Web Searches , 2003, ACL.

[7]  Silviu Cucerzan,et al.  Re-ranking search results using query logs , 2006, CIKM '06.

[8]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[9]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[10]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[11]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[12]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[13]  Yolanda Gil,et al.  An Analysis of Knowledge Collected from Volunteer Contributors , 2005, AAAI.

[14]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[15]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[16]  Lenhart K. Schubert Turing's Dream and the Knowledge Challenge , 2006, AAAI.

[17]  Melanie Remy Wikipedia: The Free Encyclopedia2002273Wikipedia: The Free Encyclopedia. http://www.wikipedia.com, 2001; updated daily. Gratis Last visited: April 2002 , 2002 .

[18]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[19]  Jaime G. Carbonell,et al.  Instance-Based Question Answering: A Data-Driven Approach , 2004, EMNLP.

[20]  Ellen M. Voorhees Evaluating Answers to Definition Questions , 2003, HLT-NAACL.

[21]  Kentaro Torisawa,et al.  Automatic Discovery of Attribute Words from Web Documents , 2005, IJCNLP.

[22]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .