Named entity recognition : Evaluation of Existing Systems

Nowadays, one subfield of information extraction, Named Entity Recognition, becomes more and more important. It helps machine to recognize proper nouns (entities) in text and associating them with the appropriate types. Common types in NER systems are location, person name, date, address, etc. There are several NER systems in the world. What‘s the main core technology of these systems? Which kind of system is better? How to improve this technology in the future? This master thesis will show the basic and detail knowledge about NER. Three existing NER systems will be choose to evaluate in this paper, GATE, CRFClassifier and LbjNerTagger. These systems are based different NER technology. They can stand for the most of NER existing systems in the world now. This paper will present and evaluate these three systems and try to find the advantage and disadvantage of each system.

[1]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[2]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[3]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[4]  Satoshi Sekine,et al.  Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy , 2004, LREC.

[5]  Malvina Nissim,et al.  A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations , 2005, Comparative and functional genomics.

[6]  Satoshi Sekine,et al.  Description of the Japanese NE System Used for MET-2 , 1998, MUC.

[7]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[8]  David D. Palmer,et al.  A Statistical Profile of the Named Entity Task , 1997, ANLP.

[9]  Zornitsa Kozareva,et al.  Cluster Analysis and Classification of Named Entities , 2004, LREC.

[10]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[11]  Jon Patrick,et al.  SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier , 2002, CoNLL.

[12]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[13]  Jun'ichi Tsujii,et al.  Boosting Precision and Recall of Dictionary-Based Protein Name Recognition , 2003, BioNLP@ACL.

[14]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[15]  Malvina Nissim,et al.  Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web , 2004, NLPBA/BioNLP.

[16]  Saul A. Kripke,et al.  Naming and Necessity , 1980 .

[17]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[18]  Kalina Bontcheva,et al.  Shallow Methods for Named Entity Coreference Resolution , 2002 .

[19]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[20]  Satoshi Sekine,et al.  Named Entity Discovery Using Comparable News Articles , 2004, COLING.

[21]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[22]  Georgios Paliouras,et al.  Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems , 2001, ACL.

[23]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[24]  Jon Patrick,et al.  Evaluating Corpora for Named Entity Recognition Using Character-Level Features , 2003, Australian Conference on Artificial Intelligence.

[25]  Derek Partridge,et al.  An introduction to learning , 1988, Artificial Intelligence Review.

[26]  Thierry Poibeau,et al.  Proper Name Extraction from Non-Journalistic Texts , 2000, CLIN.

[27]  Inderjeet Mani,et al.  2003 Standard for the Annotation of Temporal Expressions , 2004 .

[28]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[29]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[30]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[31]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[32]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  Chao-Huang Chang,et al.  Recognizing Unregistered Names for Mandarin Word Identification , 1992, COLING.

[34]  Diana Maynard,et al.  NE Recognition Without Training Data on a Language You Don't Speak , 2003, NER@ACL.

[35]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[36]  Yuji Matsumoto,et al.  Japanese Named Entity Extraction with Redundant Morphological Analysis , 2003, NAACL.

[37]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[38]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.

[39]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[40]  L. F. Rau,et al.  Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[41]  P Zweigenbaum,et al.  Identifying proper names in parallel medical terminologies. , 2000, Studies in health technology and informatics.

[42]  Heng Ji,et al.  Data Selection in Semi-supervised Learning for Name Tagging , 2006 .

[43]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[44]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[45]  Yorick Wilks,et al.  How feasible is the reuse of grammars for Named Entity Recognition? , 2002, LREC.

[46]  William W. Cohen,et al.  Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text , 2005, HLT.

[47]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[48]  Michael Fleischman Automated Subcategorization of Named Entities , 2001, ACL.

[49]  Maria Liakata,et al.  A System for Recognition of Named Entities in Greek , 2000, Natural Language Processing.

[50]  Steffen Staab,et al.  Learning Ontologies for the Semantic Web , 2001, SemWeb.