论文信息 - On similarity measures for cluster analysis in clinical laboratory examination databases

On similarity measures for cluster analysis in clinical laboratory examination databases

This paper discusses how the conventional similarity measure works on the practical medical data set. The similarity measure used was linear combination of the Mahalanobis distance between numerical attributes and the Hamming distance between nominal attributes. We performed clustering experiments on the meningoencephalitis data set using the similarity measure in conjunction with four types of clustering algorithms: single- and complete-linkage agglomerative hierarchical clustering, Ward's method and rough clustering. Usefulness of the similarity measure was evaluated from the following viewpoints: (1) quality of the generated clusters; and (2) clinical reasonability of the attributes used to generate the high-quality clusters. The results show that the best clusters were obtained using Ward's method where the clinically reasonable attributes were selected. It suggests that this similarity measures would be applicable to the medical data sets.

[1] Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[2] Janusz Zalewski,et al. Rough sets: Theoretical aspects of reasoning about data , 1996 .

[3] Michael R. Anderberg,et al. Cluster Analysis for Applications , 1973 .

[4] Shusaku Tsumoto,et al. Indiscernibility degree of objects for evaluating simplicity of knowledge in the clustering procedure , 2001, Proceedings 2001 IEEE International Conference on Data Mining.