Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning

Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, location, and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to maintain, and corpus-based statistical approaches are expected to be more robust and require less human intervention. Several statistical approaches have been reported in the literature. In a recent Japanese NE workshop, a maximum entropy (ME) system outperformed decision tree systems and most hand-crafted systems. Here, we propose an alternative method based on a simple rule generator and decision tree learning. Our experiments show that its performance is comparable to the ME approach. We also found that it can be trained more efficiently with a large set of training data and that it improves readability.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Masaki Murata,et al.  Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules , 2000, ACL.

[3]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..

[4]  Takehito Utsuro,et al.  Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity Recognition , 2000, COLING.

[5]  Anthony F. Gallippi,et al.  Learning to Recognize Names Across Languages , 1996, COLING.

[6]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[7]  Eric Sven Ristad,et al.  Maximum Entropy Modeling Toolkit , 1996, ArXiv.

[8]  Eric Brill,et al.  Pattern-Based Disambiguation for Natural Language Processing , 2000, EMNLP.

[9]  Takehito Utsuro,et al.  Minimally Supervised Japanese Named Entity Recognition: Resources and Evaluation , 2000, LREC.

[10]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[11]  Georgios Paliouras,et al.  Learning Decision Trees for Named-Entity Recognition and Classification , 2000 .

[12]  Satoshi Sekine,et al.  Japanese Named Entity Extraction Evaluation - Analysis of Results - , 2000, COLING.

[13]  Jim Cowie CRL/NMSU: description of the CRL/NMSU systems used for MUC-6 , 1995, MUC.

[14]  Ralph Grishman,et al.  A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[15]  James F. Allen Natural language understanding (2nd ed.) , 1995 .