Ancient medical literature semantic annotation using hidden markov models

Traditional Chinese medicine (TCM) has accumulated amount of literature with a total of 1,059 volumes, more than 190,000 chapters, and more than 120,000,000 words during the last 2000 years. In the previous works, researchers annotated the phrases one by one with their own hands. Here we propose semantic annotation techniques based on Semantic units division and annotation are realized through constructing a corpus and professional semantic unit dictionary. Based on the technology, a semantic annotation method is implemented using hidden markov models, which achieves 92.2% in terms of micro-average F1 measure and 87.6% in terms of macro-average F1 measure on the case of spleen putty genre.