Token Gazetteer and Character Gazetteer for Named Entity Recognition

Named entity recognition (NER) in information extraction (IE) sys- tems is usually based on large gazetteers — datasets of well-known and classi- fied entities. NER is also often performed by independent look-up piece of code, which is considered as a bottleneck of many NER systems. In this paper, we present two approaches for building tree gazetteers for NER; i.e. lookup by token and by character.