Towards large-scale , open-domain and ontology-based named entity classification 1

Named entity recognition and classification research has so far mainly focused on supervised techniques and has typically considered only small sets of classes with regard to which to classify the recognized entities. In this paper we address the classification of named entities with regard to large sets of classes which are specified by a given ontology. Our approach is unsupervised as it relies on no labeled training data and is open-domain as the ontology can simply be exchanged. The approach is based on Harris’ distributional hypothesis and, based on the vector-space model, it assigns a named entity to the contextually most similar concept from the ontology. The main contribution of the paper is a systematic analysis of the impact of varying certain parameters on such a context-based approach exploiting similarities in vector space for the disambiguation of named entities.

[1]  Steffen Staab,et al.  Ontology Learning Part One - On Discoverying Taxonomic Relations from the Web , 2002 .

[2]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[3]  Georgios Paliouras,et al.  Learning Decision Trees for Named-Entity Recognition and Classification , 2000 .

[4]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[5]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[6]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[7]  Suresh Manandhar,et al.  Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures , 2002, EKAW.

[8]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[9]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[10]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[11]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[12]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[13]  Malvina Nissim,et al.  Using the Web for Nominal Anaphora Resolution , 2003 .

[14]  Kalina Bontcheva,et al.  A Light-weight Approach to Coreference Resolution for Named Entities in Text , 2002 .

[15]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[16]  Hinrich Schütze,et al.  Customizing a Lexicon to Better Suit a Computational Task , 1996 .

[17]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[18]  Dekang Lin,et al.  Principle-Based Parsing Without Overgeneration , 1993, ACL.

[19]  Ralph Grishman,et al.  A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[20]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[21]  Dekang Lin Using Collocation Statistics in Information Extraction , 1998, MUC.

[22]  CucchiarelliAlessandro,et al.  Unsupervised named entity recognition using syntactic and semantic contextual evidence , 2001 .

[23]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[24]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[25]  Kalina Bontcheva,et al.  Towards a semantic extraction of named entities , 2003 .

[26]  Cheng Niu,et al.  A Bootstrapping Approach to Named Entity Classification Using Successive Learners , 2003, ACL.

[27]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[28]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.

[29]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[30]  Richard J. Evans,et al.  A framework for named entity recognition in the open domain , 2003, RANLP.