Domain-independent Model for Chemical Compound and Drug Name Recognition

This paper briefly describes the works that we have carried out as part of our participation in the BioCreative-IV Track-2 shared task on chemical compound and drug name recognition. We submit five runs, all of which are based on the machine learning approaches. As the machine learning techniques we use Conditional Random Field (CRF), Support Vector Machine (SVM) and a simple ensemble technique. Our system is domain-independent in the sense that it does not make use of any domain-specific external resources and/or tools. Here we report the evaluation results for only of those runs where development set is not included as part of the training procedure. We obtain the best performance with a CRF based model that shows the micro average recall, precision and F-score values of 72.80%, 75.82% and 74.28%, respectively. The same model yields the macro average recall, precision and F-core values of 73.96%, 74.22% and 72.47%, respectively.