Biological Entity Recognition with Conditional Random Fields

Due to the rapid evolution of molecular biology and the lack of naming standards, biological entity recognition (BER) remains a challenging task for information extraction and natural language understanding. In this study, we presented a statistical machine learning approach for extracting features, modeling, and predicting biological named entities. Our approach utilizes UMLS semantic types together with MetaMap, SemRep, and ABGene, as well as the conditional random fields (CRF) framework, and learns both the structure and parameters of a statistical model. Results of this study are competitive with the results of the state of the art tools in this field. Unlike competing similar approaches, the presented method is fully automatic, hence more generalizable and directly transferable to other named entity recognition (NER) problems in medical informatics.