CustNER: A Rule-Based Named-Entity Recognizer With Improved Recall

This article describes CustNER: a system for named-entity recognition (NER) of person, location, and organization. Realizing the incorrect annotations of existing NER, four categories of false negatives have been identified. The NEs not annotated contain nationalities, have corresponding resource in DBpedia, are acronyms of other NEs. A rule-based system, CustNER, has been proposed that utilizes existing NERs and DBpedia knowledge base. CustNER has been trained on the open knowledge extraction (OKE) challenge 2017 dataset and evaluated on OKE and CoNLL03 (Conference on Natural Language Learning) datasets. The OKE dataset has also been annotated with the three types. Evaluation results show that CustNER outperforms existing NERs with F score 12.4% better than Stanford NER and 3.1% better than Illinois NER. On another standard evaluation dataset for which the system is not trained, the CoNLL03 dataset, CustNER gives results comparable to existing systems with F score 3.9% better than Stanford NER, though Illinois NER F score is 1.3% better than CustNER.