Singling out Ill-fit Items in a Classification. Application to the Taxonomy of Enterobacteriaceae
暂无分享,去创建一个
We address the problem of evaluating the quality of cluster assignment of in- dividual items in a classification. The problem can also be viewed as outlier detection in classifications. We describe simple methods for this task based on the use of Naive Bayes classification. Applied to two existing classifica- tions of 5313 strains of bacteria the method indicated that one classification is far more robust than the other. The observations that fit badly to their clusters are typically items whose classification is suspect also from other considerations. Removing these elements from the data set, performing clus- tering on the reduced data set, and adding the outliers back one-by-one yielded a clustering that has a higher likelihood than the previous accepted classifications. Investigation of this new clustering lead to suggested changes in the classification of 69 strains in the material.