Distinguishing between Instances and Classes in the Wikipedia Taxonomy

This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc. On the subnetwork shared by our taxonomy and ResearchCyc we report 84.52% accuracy.

[1]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[2]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[3]  Conclusions , 1989 .

[4]  斉藤 康己,et al.  Douglas B. Lenat and R. V. Guha : Building Large Knowledge-Based Systems, Representation and Inference in the Cyc Project, Addison-Wesley (1990). , 1990 .

[5]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[6]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[7]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[8]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[9]  Carole D. Hafner,et al.  The State of the Art in Ontology Design: A Survey and Comparative Review , 1997, AI Mag..

[10]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[11]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[14]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[15]  Nicola Guarino,et al.  Conceptual analysis of lexical taxonomies: the case of WordNet top-level , 2001, FOIS.

[16]  Christopher K. I. Williams,et al.  Advances in Neural Information Processing Systems 15 (NIPS 2002) , 2002 .

[17]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[18]  Nicola Guarino,et al.  Restructuring WordNet's Top-Level: The OntoClean approach , 2002 .

[19]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[20]  George A. Miller,et al.  TOWARDS BUILDING A WORDNET NOUN ONTOLOGY , 2006 .

[21]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[22]  George A. Miller,et al.  Squibs and Discussions: WordNet Nouns: Classes and Instances , 2006, CL.

[23]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.