Research on Syndrome Classification and Risk Factors Extraction of Tibetan Medicine Based on Clustering

Clustering which can divide data into a lot of subsets is one of the significant methods in the field of data mining, machine learning, artificial intelligence and so on. It is an unsupervised learning method and can solve the problem which is how to divide some unlabeled objects. The characteristic is that there is no need to provide priori information for clustering analysis. Usually, the procedures of clustering are feature selection, similarity degree calculation, clustering algorithm selection and conclusion test. Choosing different methods on each procedure is a rule which can distinguish clustering algorithm. The purpose of this paper is researching on the ways of common plateau diseases Tibetan medicine syndrome classification and risk factors extraction. Based on the diagnosis data of chronic atrophic gastritis provided by Qinghai Tibetan hospital, this paper uses Elbow Method to choose the best cluster number and applies Weka to classify syndrome according to five clustering algorithms after data preprocessing. Based on the analysis of experiment results and evaluation criteria, the suitable algorithm is selected and the risk factors are extracted. After comparing the algorithms and experiment results, it can be concluded that EM algorithm is effective and it has obvious advantages in discrete data.