Next Improvement Towards Linear Named Entity Recognition Using Character Gazetteers

Natural Language Processing (NLP) is important and interesting area in computer science affecting also other spheres of science; e.g., geographical processing, social statistics, molecular biology. A large amount of textual data is continuously produced in media around us and therefore there is a need of processing it in order to extract required information. One of the most important processing steps in NLP is Named Entity Recognition (NER), which recognizes occurrence of known entities in input texts. Recently, we have already presented our approach for linear NER using gazetteers, namely Hash-map Multi-way Tree (HMT) and first-Child next-Sibling binary Tree (CST) with their strong and weak sides. In this paper, we present Patricia Hash-map Tree (PHT) character gazetteer approach, which shows as the best compromise between the both previous versions according to matching time and memory consumption.

[1]  Walter Daelemans,et al.  Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 , 2003 .

[2]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[3]  Ladislav Hluchý,et al.  Ontea: Platform for Pattern Based Automated Semantic Annotation , 2009, Comput. Informatics.

[4]  Giang Nguyen,et al.  Character gazetteer for Named Entity Recognition with linear matching complexity , 2013, 2013 Third World Congress on Information and Communication Technologies (WICT 2013).

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[7]  Violeta Seretan,et al.  Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop , 2006 .

[8]  Rada Mihalcea,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[9]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[10]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[11]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[12]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[13]  Peter Krammer,et al.  MSM2013 IE Challenge: Annotowatch , 2013, #MSM.

[14]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[15]  Giang Nguyen Token Gazetteer and Character Gazetteer for Named Entity Recognition , 2011 .

[16]  Zornitsa Kozareva Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists , 2006, EACL.