Classifying Unknown Proper Noun Phrases Without Context

We present a probabilistic generative model used to classify unknown Proper Noun Phrases into semantic categories. The core of the classifier is an n-gram character model, which is enhanced with an n-gram word-length model and a common word model. While most work has depended largely on context or domain-specific rules for semantic disambiguation of unknown names, we demonstrate that there is surprisingly reliable statistical information available in the composition of the names themselves. Using the context-independent probabilities assigned by our domain independent classifier is sufficient to achieve greater than 90% classification accuracy on typical tasks.

[1]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..

[2]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[3]  Andrei Mikheev,et al.  Automatic Rule Induction for Unknown-Word Guessing , 1997, CL.

[4]  Boonserm Kijsirikul,et al.  Feature-based Proper Name Identification in Thai , 1998 .

[5]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[6]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[7]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[8]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[9]  David A. Campbell,et al.  A technique for semantic classification of unknown words using UMLS resources , 1999, AMIA.

[10]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[11]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[12]  P Zweigenbaum,et al.  Identifying proper names in parallel medical terminologies. , 2000, Studies in health technology and informatics.

[13]  R. L. Bradshaw,et al.  RESULTS AND ANALYSIS. , 1971 .

[14]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[15]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[16]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[17]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.