RENT: Regular Expression and NLP-Based Term Extraction Scheme for Agricultural Domain

This paper addresses the task of automatic term extraction in agricultural domain. There is a paramount call for applying effective data processing on a huge amount of agricultural data lying unprocessed. The method is based on basic techniques in Named-entity recognition, and involves a resequencing of the conventional procedure of automatic term extraction. Several domain-specific patterns identified by the domain experts have been used for this purpose in the baseline algorithm. After evaluating the performance of baseline, several improvements have been proposed by observing the obtained results on a given agricultural text. These improvements have been incorporated into the RENT algorithm. Both the algorithms have been applied on more than 1400 pages of agricultural text. It is concluded that the RENT algorithm significantly outperforms the baseline algorithm with a precision of more than 80 %, recall more than 60 % and f-measure more than 68 % on random samples. A comparison with the Termine, a well-known software for term extraction, is also presented which shows that RENT has a better precision.