Conditional Random Fields for Term Extraction

In this paper, we describe how to construct a machine learning framework that utilizes syntactic information in extraction of biomedical terms. Conditional random fields (CRF), is used as the basis of this framework. We make an effort to find the appropriate use for syntactic information, including parent nodes, syntactic paths and term ratios under the machine learning framework. The experiment results show that syntactic paths and term ratios can improve precision of term extraction, including old terms and novel terms. However, the recall rate of novel terms still needs to be increased. This research serves as an example for constructing machine learning based term extraction systems that utilizes linguistic information.