Learning and Evaluating the Content and Structure of a Term Taxonomy

In this paper, we describe a weakly supervised bootstraping algorithm that reads Web texts and learns taxonomy terms. The bootstrapping algorithm starts with two seed words (a seed hypernym (Root concept) and a seed hyponym) that are inserted into a doubly anchored hyponym pattern. In alternating rounds, the algorithm learns new hyponym terms and new hypernym terms that are subordinate to the Root concept. We conducted an extensive evaluation with human annotators to evaluate the learned hyponym and hypernym terms for two categories: animals and people.

[1]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[2]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[3]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[4]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[5]  Bruce W. Porter,et al.  Controlling Search for the Consequences of New Information During Knowledge Integration , 1989, ML.

[6]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[7]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[8]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[9]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[10]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[11]  Ari Rappoport,et al.  Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[12]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[13]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[14]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[15]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[17]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[18]  Jerry R. Hobbs,et al.  Learning by Reading: A Prototype System, Performance Baseline and Lessons Learned , 2007, AAAI.

[19]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[20]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .