Similarity between categorisations
暂无分享,去创建一个
The problem of assessing the similarity of two groupings of data into categories is considered. Such groupings may arise as a consequence of subjective categorisation by different people or by repeated machine classification by an algorithm which is dependent on the order of data submission or even non-deterministic. Several self organising classification algorithms (e.g. fuzzy ART and fuzzy Min-Max) suffer from this property. In either case the question of the reliability of the categorisation can only be answered if some notion of distance between different categorisations is available. (This is because the question of reliability of the method does not depend on always producing exactly the same classification but merely on producing sufficiently close classifications). With such a notion it is possible to conduct experimental work on any data set to ensure that there is no important variation in the categorisation depending on the order in which observations are submitted to the classifier. Several different methods of measuring distance are proposed and investigated. Importantly two of these measures are conducted in the appropriate algebraic setting in which to compare partitions of sets-the lattice of equivalence relations on a finite set. Whilst this involves some computational difficulties it is hoped that the benefits of an appropriate measure outweigh them. A comparison of two of these distance measures using a known data set from the literature is performed, and the results reported.
[1] Roel Popping,et al. On Agreement Indices for Nominal Data , 1988 .
[2] Patrick K. Simpson,et al. Fuzzy min-max neural networks - Part 2: Clustering , 1993, IEEE Trans. Fuzzy Syst..