Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques

Abstract Named Entity Recognition (NER) is one of the fundamental process in Natural Language Processing applications. In this paper, we propose an Agriculture Named Entity Recognition using Topic Modelling techniques (AERTM Algorithm). In the agriculture domain, we have identified Names of Crops, Soil Types, Names of Pathogen, Crop Diseases and Fertilizers as the key entities. Our work presents a hybrid approach using the agriculture vocabulary AGROVOC and the AERTM algorithm. We used AGROVOC for identifying crop names. But it failed to identify Soil Types, Crop Diseases and Fertilizers. Hence, for those entities we propose a Latent Dirichlet Allocation (LDA) based topic modelling algorithm. These named entities can be used for creating a knowledge base which can be further used mainly in Relation Extraction systems, forums supported by various Government distinguished repositories, etc. Because of the absence of benchmark agriculture data, we tested our model using 3000 sentences extracted from reputed agriculture sites. Human evaluation of the method confirms that our approach gives an accuracy of 80%.