GenNER - A highly scalable and optimal NER method for text-based gene and protein recognition

Nowadays, there are a large number of models in the scientific literature capable of recognizing and extracting gene mentions from a given text. Several data sets have been developed to facilitate the learning process of these models. However, very few models are able to increase their knowledge and performance progressively from new annotated text but also to take into account the granularity of the input text of the model. Our proposed solution, GenNER, is a method for recognizing gene/protein mentions from free text. GenNER relies on continuous learning and a text granularization algorithm as input to the model, which allows it to achieve better performance. Its evaluation process was done around BioCreative II annotated datasets; we obtained an average F1-score of 0.9704, which outperforms current methods.