Analyzing Sequence Data Based on Conditional Random Fields with Co-training

Sequence data plays an important role in data analysis applications, such as sequence classification. One important aspect of sequence data analysis is to obtain the labeled sequence data and use a machine learning model to predict the sequence structures. Conditional Random Fields (CRF) is such a machine learning method which is popular used in sequential data analysis. This is because that CRF can effectively capture the data correlations in context with abundant training data. However, in real applications, the labeled training data is usually difficult to be collected. In order to reduce the requirement of the amount of the labeled training data, a novel model is proposed named Conditional Random Fields with Co-training (Co-CRF). The Co-CRF model can work well even on the reduced labeled training data. Empirical results show that Co-CRF can produce a more accurate analysis than the traditional CRF, especially with very limited training data.