A survey of named entity recognition and classification

This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.

[1]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[2]  Nina Wacholder,et al.  Extracting Names from Natural-Language Text , 2000 .

[3]  A. Waibel,et al.  Multilingual named entity extraction and translation from text and speech , 2006 .

[4]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[5]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[6]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[7]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[8]  Jun'ichi Tsujii,et al.  Boosting Precision and Recall of Dictionary-Based Protein Name Recognition , 2003, BioNLP@ACL.

[9]  Jon Patrick,et al.  SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier , 2002, CoNLL.

[10]  Martin Jansche Named Entity Extraction with Conditional Markov Models and Classifiers , 2002, CoNLL.

[11]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[12]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[13]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[14]  Roberto Basili,et al.  RitroveRAI: A Web Application for Semantic Indexing and Hyperlinking of Multimedia News , 2005, SEMWEB.

[15]  Michael Fleischman Automated Subcategorization of Named Entities , 2001, ACL.

[16]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[17]  Maria Liakata,et al.  A System for Recognition of Named Entities in Greek , 2000, Natural Language Processing.

[18]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[19]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[20]  Frantz Vichot,et al.  Automatic Processing of Proper Names in Texts , 1995, EACL.

[21]  Jian Su,et al.  Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[22]  Dan Roth,et al.  Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches , 2004, AAAI.

[23]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[24]  William W. Cohen,et al.  Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text , 2005, HLT.

[25]  Paola Velardi,et al.  Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence , 2001, CL.

[26]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[27]  J. Altham Naming and necessity. , 1981 .

[28]  Jakub Piskorski,et al.  Extraction of Polish Named-Entities , 2004, LREC.

[29]  Yorick Wilks,et al.  University of Sheffield: description of the LaSIE system as used for MUC-6 , 1995, MUC.

[30]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[31]  Shih-Hung Wu,et al.  Various criteria in the evaluation of biomedical named entity recognition , 2006, BMC Bioinformatics.

[32]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[33]  David D. Palmer,et al.  A Statistical Profile of the Named Entity Task , 1997, ANLP.

[34]  Chao-Huang Chang,et al.  Recognizing Unregistered Names for Mandarin Word Identification , 1992, COLING.

[35]  Christine Thielen,et al.  An Approach to Proper Name Tagging for German , 1995, cmp-lg/9506024.

[36]  Jon Patrick,et al.  Evaluating Corpora for Named Entity Recognition Using Character-Level Features , 2003, Australian Conference on Artificial Intelligence.

[37]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[38]  David D. McDonald Internal and External Evidence in the Identification and Semantic Categorization of Proper Names , 1993 .

[39]  Diana Maynard,et al.  Creation of Reusable Components and Language Resources for Named Entity Recognition in Russian , 2004, LREC.

[40]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[41]  Dimitrios Kokkinakis,et al.  AVENTINUS, GATE and Swedish Lingware , 1998, NODALIDA.

[42]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[43]  L. F. Rau,et al.  Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[44]  Yuji Matsumoto,et al.  Japanese Named Entity Extraction with Redundant Morphological Analysis , 2003, NAACL.

[45]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[46]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[47]  Xavier Carreras,et al.  Named Entity Recognition For Catalan Using Only Spanish Resources and Unlabelled Data , 2003, EACL.

[48]  Nuno Seco,et al.  HAREM: An Advanced NER Evaluation Contest for Portuguese , 2006, LREC.

[49]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[50]  Ralph Grishman,et al.  Unsupervised Learning of Generalized Names , 2002, COLING.

[51]  Hsin-Hsi Chen,et al.  Identification and Classification of Proper Nouns in Chinese Texts , 1996, COLING.

[52]  Eckhard Bick A Named Entity Recognizer for Danish , 2004, LREC.

[53]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[54]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[55]  Georgios Paliouras,et al.  Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems , 2001, ACL.

[56]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[57]  Thierry Poibeau Dealing with Metonymic Readings of Named Entities , 2006, ArXiv.

[58]  Thierry Poibeau,et al.  Proper Name Extraction from Non-Journalistic Texts , 2000, CLIN.

[59]  Inderjeet Mani,et al.  2003 Standard for the Annotation of Temporal Expressions , 2004 .

[60]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[61]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[62]  Premkumar Natarajan,et al.  Surprise! What's in a Cebuano or Hindi Name? , 2003, TALIP.

[63]  Ian H. Witten,et al.  Using language models for generic entity extraction , 1999 .

[64]  Satoshi Sekine,et al.  Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy , 2004, LREC.

[65]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[66]  Enrico Motta,et al.  ESpotter: Adaptive Named Entity Recognition for Web Browsing , 2005, Wissensmanagement.

[67]  James Allan,et al.  Using Soundex Codes for Indexing Names in ASR Documents , 2004, HLT-NAACL 2004.

[68]  Richard J. Evans,et al.  A framework for named entity recognition in the open domain , 2003, RANLP.

[69]  Thierry Poibeau,et al.  The Multilingual Named Entity Recognition Framework , 2003, EACL.

[70]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[71]  Shuanhu Bai,et al.  Description of the Kent Ridge Digital Labs System Used for MUC-7 , 1998, MUC.

[72]  Satoshi Sekine,et al.  Named Entity Discovery Using Comparable News Articles , 2004, COLING.

[73]  Hitoshi Isahara,et al.  IREX: IR & IE Evaluation Project in Japanese , 2000, LREC.

[74]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[75]  Andrei Mikheev A Knowledge-free Method for Capitalized Word Disambiguation , 1999, ACL.

[76]  P Zweigenbaum,et al.  Identifying proper names in parallel medical terminologies. , 2000, Studies in health technology and informatics.

[77]  Heng Ji,et al.  Data Selection in Semi-supervised Learning for Name Tagging , 2006 .

[78]  Satoshi Sekine,et al.  Description of the Japanese NE System Used for MET-2 , 1998, MUC.

[79]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[80]  Zornitsa Kozareva,et al.  Cluster Analysis and Classification of Named Entities , 2004, LREC.

[81]  Sam Coates-Stephens,et al.  The Analysis and Acquisition of Proper Names for the Understanding of Free Text , 1992, Comput. Humanit..

[82]  Gary Geunbae Lee,et al.  Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping , 2005, IJCNLP.