论文信息 - Using structured text for large-scale attribute extraction

Using structured text for large-scale attribute extraction

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.

Sujith Ravi | Marius Pasca

[1] Kentaro Torisawa,et al. Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[2] Hsin-Hsi Chen,et al. Mining Tables from Large Scale HTML Texts , 2000, COLING.

[3] Weblog Wikipedia,et al. In Wikipedia the Free Encyclopedia , 2005 .

[4] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.

[5] Kentaro Torisawa,et al. Automatic Discovery of Attribute Words from Web Documents , 2005, IJCNLP.

[6] Naoki Yoshinaga,et al. Open-Domain Attribute-Value Acquisition from Semi-Structured Texts , 2007 .

[7] Doug Downey,et al. KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[8] Sriram Raghavan,et al. Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[9] Yolanda Gil,et al. An Analysis of Knowledge Collected from Volunteer Contributors , 2005, AAAI.

[10] Rayid Ghani,et al. Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions , 2007, IJCAI.

[11] Benjamin Van Durme,et al. The role of documents vs. queries in extracting class attributes from text , 2007, CIKM '07.

[12] Daniel S. Weld,et al. Automatically refining the wikipedia infobox ontology , 2008, WWW.

[13] Raghu Ramakrishnan,et al. Community Information Management , 2006, IEEE Data Eng. Bull..

[14] Luis Gravano,et al. Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[15] Anastasia Ailamaki,et al. Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[16] Patrick Pantel,et al. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[17] Marius Pasca,et al. Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.