Parameter estimation of Conditional Random Fields model based on cloud computing

Conditional Random Field (CRF), a type of conditional probability model, has been widely used in Nature Language Processing (NLP), such as sequential data segmentation and labeling. The advantage of CRF model is the ability to express long-distance-dependent and overlapping features. However, the model parameter estimation of CRF is very time-consuming because of the large amount of calculation. This paper describes the method that use of MapReduce model to parallel estimate the model parameters of CRF in open-source and distributed computing framework that provided by Hadoop. Experiments demonstrated that the proposed method can effectively reduce the time complexity of model parameter estimation.