The Isolation Approach to Hierarchical Clustering
暂无分享,去创建一个
The idea of clusters being internally cohesive and externally isolated is consistently developed into a principle of hierarchical clustering. The principle rests on defining for each subset of at least two objects its degree of internal and external differentiation. Subsets with larger external than internal differentiation are considered as isolated groups, and it is shown that the resulting collection of all isolated groups forms an encaptic (hierarchical) structure. In an encaptic structure any two groups are either completely disjoint or one is included (nested) in the other. The clustering principle is applicable to arbitrary symmetrical difference measures between objects, does not have to rely on any special agglomerative or divisive type of clustering algorithm, always produces unambiguous results (including arbitrary numbers of tied objects), and is straight-forward to interpret. For example, the existence of solitary objects (which do not belong to any cluster) indicates that clustering may not be possible for all objects. At the extreme, this situation may result in complete inhibition of clustering, in which case the objects are evenly dispersed. External as well as internal degrees of differentiation strictly increase with (encaptic) levels of hierarchy, and equal levels of hierarchy are characterized by equal external but not necessarily internal degrees of differentiation. Further desirable features revealed by the joint consideration of external and internal differentiation are elaborated. The significance of the unrestricted choice of difference measures allowed by the isolation principle is demonstrated for a basic problem of phylogenetic reconstruction.