Low-Cost Supervision for Multiple-Source Attribute Extraction

Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we introduce Bootstrapped Web Search (BWS) extraction, the first approach to extracting class attributes simultaneously from both sources. Extraction is guided by a small set of seed attributes and does not rely on further domain-specific knowledge. BWS is shown to improve extraction precision and also to improve attribute relevance across 40 test classes.

[1]  Benjamin Van Durme,et al.  The role of documents vs. queries in extracting class attributes from text , 2007, CIKM '07.

[2]  Kam-Fai Wong,et al.  Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings , 2005, IJCNLP.

[3]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[4]  Lenhart K. Schubert Turing's Dream and the Knowledge Challenge , 2006, AAAI.

[5]  Sujith Ravi,et al.  Using structured text for large-scale attribute extraction , 2008, CIKM '08.

[6]  Yolanda Gil,et al.  An Analysis of Knowledge Collected from Volunteer Contributors , 2005, AAAI.

[7]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[8]  Naoki Yoshinaga,et al.  Open-Domain Attribute-Value Acquisition from Semi-Structured Texts , 2007 .

[9]  Satoshi Sekine,et al.  Named Entity Discovery Using Comparable News Articles , 2004, COLING.

[10]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[11]  Kentaro Torisawa,et al.  Automatic Discovery of Attribute Words from Web Documents , 2005, IJCNLP.

[12]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[13]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[14]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[15]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[16]  Rayid Ghani,et al.  Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions , 2007, IJCAI.

[17]  Ronen Feldman,et al.  Boosting Unsupervised Relation Extraction by Using NER , 2006, EMNLP.

[18]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[19]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[20]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[21]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[22]  Ellen M. Voorhees Evaluating Answers to Definition Questions , 2003, HLT-NAACL.

[23]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.