Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition

This paper describes a novel statistical namedentity (i.e. "proper name") recognition system built around a maximum entity framework. By working v,ithin the framework of maximum entropy theory and utilizing a flexible object-based architecture, the system is able to make use of an extraordinarily diverse range of knowledge sources in making its tagging decisions. These knowledge sources include capitalization features, lexical features, features indicating the current section of text (i.e. headline or main body), and dictionaries of single or multi-word terms. The purely statistical system contains no hand-generated patterns and achieves a result comparable with the best statistical systems. However, when combined with other handcoded systems, the system achieves scores that exceed the highest comparable scores thus-far published. 1 I N T R O D U C T I O N Named entity recognition is one of the simplest of the common message understanding tasks. The objective is to identify and categorize all members of certain categories of "proper names" from a given corpus. The specific test bed which will be the subject of this paper is that of the Seventh Message Understanding Conference (MUC-7), in which the task was to identify "names" falling into one of seven categories: person, organization, location, date, time, percentage, and monetary amount. This paper describes a new system called "Maximum Entropy Named Entity" or "MENE" (pronounced "meanie"). By working within the framework of maximum entropy theory and utilizing a flexible object-based architecture, the system is able to make use of an extraordinarily diverse range of knowledge sources in making its tagging decision. These knowledge sources include capitalization features, lexical features, and features indicating the current section of text. It makes use of a broad array of dictionaries of useful single or multi-word terms such as first names, company names, and corporate suffixes, and automatically handles cases where words are in more than one dictionary. Our dictio152 naries required no manual editing and were either downloaded from the web or were simply "obvious" lists entered by hand. This system, built from off-the-shelf knowledge sources, contained no hand-generated pat terns and achieved a result which is comparable with that of the best statistical systems. Further experiments showed that when combined with handcoded systems from NYU, the University of Manitoba, and IsoQuest, Inc., MENE was able to generate scores which exceeded the highest scores thus-far reported by any system on a MUC evaluation. Given appropriate training data, we believe that this system is highly portable to other domains and languages and have already achieved good results on upper-case English. We also feel that there are plenty of avenues to explore in enhancing the system's performance on English-language newspaper

[1]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[2]  Ralph Grishman,et al.  The NYU System for MUC-6 or Where’s the Syntax? , 1995, MUC.

[3]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[4]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[5]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[7]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[9]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[10]  Andrew Kehler,et al.  Probabilistic Coreference in Information Extraction , 1997, EMNLP.

[11]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[12]  Dekang Lin Using Collocation Statistics in Information Extraction , 1998, MUC.

[13]  Adam L. Berger,et al.  A Comparison of Criteria for Maximum Entropy/ Minimum Divergence Feature Selection , 1998, EMNLP.

[14]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[15]  Ralph Grishman,et al.  A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[16]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[17]  Richard M. Schwartz,et al.  BBN: Description of the SIFT System as Used for MUC-7 , 1998, MUC.