A Method of Annotating Disease Names in TCM Patents Based on Co-training

In the era of big data, annotated text data is a scarce resource. The annotated important semantic information can be used as keywords in text analysis, mining and intelligent retrieval, as well as valuable training and testing sets for machine learning. In the analysis, mining and intelligent retrieval of Traditional Chinese Medicine (TCM) patents, similar to Chinese herbal medicine name and medicine efficacy, disease name is also an important annotation object. Utilizing the characteristics of TCM patent texts and based on co-training method in machine learning, this paper proposes a method of annotating disease names from TCM patent texts. Experiments show that this method is feasible and effective. This method can also be extended to annotate other semantic information in TCM patents.