Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs

We present a novel approach to weakly supervised semantic class learning from the web, using a single powerful hyponym pattern combined with graph structures, which capture two properties associated with pattern-based extractions: popularity and productivity. Intuitively, a candidate is popular if it was discovered many times by other instances in the hyponym pattern. A candidate is productive if it frequently leads to the discovery of other instances. Together, these two measures capture not only frequency of occurrence, but also cross-checking that the candidate occurs both near the class name and near other class members. We developed two algorithms that begin with just a class name and one seed instance and then automatically generate a ranked list of new class instances. We conducted experiments on four semantic classes and consistently achieved high accuracies.

[1]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[2]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[3]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[4]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[6]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[7]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[8]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[9]  Tonio Wandmacher,et al.  Automatic Acquisition of the , 2009, EMNLP.

[10]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[11]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[12]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[13]  Gideon S. Mann Fine-Grained Proper Noun Ontologies for Question Answering , 2002, COLING 2002.

[14]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[15]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[16]  Ari Rappoport,et al.  Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words , 2006, ACL.

[17]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[18]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[19]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[20]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[21]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[22]  Bernardo Magnini,et al.  Weakly Supervised Approaches for Ontology Population , 2008, EACL.

[23]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[24]  Brian Roark,et al.  Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction , 2000, COLING.

[25]  Marti A. Hearst Automatic Acquisition of Hyponyms , 1992 .

[26]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[27]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[28]  Ellen Riloff,et al.  Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons , 2002, EMNLP.

[29]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[30]  Martin G. Everett,et al.  A Graph-theoretic perspective on centrality , 2006, Soc. Networks.

[31]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.