Using structured text for large-scale attribute extraction

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.

[1]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[2]  Hsin-Hsi Chen,et al.  Mining Tables from Large Scale HTML Texts , 2000, COLING.

[3]  Weblog Wikipedia,et al.  In Wikipedia the Free Encyclopedia , 2005 .

[4]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[5]  Kentaro Torisawa,et al.  Automatic Discovery of Attribute Words from Web Documents , 2005, IJCNLP.

[6]  Naoki Yoshinaga,et al.  Open-Domain Attribute-Value Acquisition from Semi-Structured Texts , 2007 .

[7]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[8]  Sriram Raghavan,et al.  Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[9]  Yolanda Gil,et al.  An Analysis of Knowledge Collected from Volunteer Contributors , 2005, AAAI.

[10]  Rayid Ghani,et al.  Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions , 2007, IJCAI.

[11]  Benjamin Van Durme,et al.  The role of documents vs. queries in extracting class attributes from text , 2007, CIKM '07.

[12]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[13]  Raghu Ramakrishnan,et al.  Community Information Management , 2006, IEEE Data Eng. Bull..

[14]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[15]  Anastasia Ailamaki,et al.  Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[16]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[17]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.