Feature Generation of Dictionary for Named-Entity Recognition based on Machine Learning

Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.

[1]  Bernardo Magnini,et al.  A WordNet-Based Approach to Named Entites Recognition , 2022 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[7]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[8]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[9]  Anton Yuryev,et al.  Research Paper: A Simple and Practical Dictionary-based Approach for Identification of Proteins in Medline Abstracts , 2004, J. Am. Medical Informatics Assoc..

[10]  Thierry Poibeau,et al.  The Multilingual Named Entity Recognition Framework , 2003, EACL.

[11]  Takehito Utsuro,et al.  Combining Outputs of Multiple Japanese Named Entity Chunkers by Stacking , 2002, EMNLP.

[12]  William J. Black,et al.  Language Independent Named Entity Classification by modified Transformation-based Learning and by Decision Tree Induction , 2002, CoNLL.

[13]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[14]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[15]  Guohong Fu,et al.  Chinese named entity recognition using lexicalized HMMs , 2005, SKDD.

[16]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[17]  Jae-Hoon Kim,et al.  A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources , 2009 .

[18]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[19]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[20]  Nina Wacholder,et al.  Extracting Names from Natural-Language Text , 2000 .

[21]  Maria Liakata,et al.  A System for Recognition of Named Entities in Greek , 2000, Natural Language Processing.

[22]  Carol Friedman,et al.  Introduction: named entity recognition in biomedicine , 2004, J. Biomed. Informatics.

[23]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[24]  Hongfang Liu,et al.  Research Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging , 2006, J. Am. Medical Informatics Assoc..

[25]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[26]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..