In order to compare clustering results against external criteria, a measure of agreement is needed. Since we assume that each gene is assigned to only one class in the external criterion and to only one cluster, measures of agreement between two partitions can be used. Given a set of objects , suppose and ! #"$ represent two different partitions of the objects in such that % &(' & ) % " *+' * and & , &(/.0 1 *2, * for 3547698 6;:<4 = and 354?>@8 >A:B4 C . Suppose that is our external criterion and is a clustering result. Let D be the number of pairs of objects that are placed in the same class in and in the same cluster in , E be the number of pairs of objects in the same class in but not in the same cluster in , F be the number of pairs of objects in the same cluster in but not in the same class in , and G be the number of pairs of objects in different classes and different clusters in both partitions. The quantities D and G can be interpreted as agreements, and E and F as disagreements. The Rand index [Rand, 1971] is simply H IKJ H IKLMIKN;I J . The Rand index lies between 0 and 1. When the two partitions agree perfectly, the Rand index is 1. A problem with the Rand index is that the expected value of the Rand index of two random partitions does not take a constant value (say zero). The adjusted Rand index proposed by [Hubert and Arabie, 1985] assumes the generalized hypergeometric distribution as the model of randomness, i.e., the and partitions are picked at random such that the number of objects in the classes and clusters are fixed. Let &O* be the number of objects that are in both class & and cluster * . Let &QP and P * be the number of objects in class & and cluster * respectively. The notations are illustrated in Table 1.
[1]
G. W. Milligan,et al.
A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis.
,
1986,
Multivariate behavioral research.
[2]
William M. Rand,et al.
Objective Criteria for the Evaluation of Clustering Methods
,
1971
.
[3]
D. Botstein,et al.
Cluster analysis and display of genome-wide expression patterns.
,
1998,
Proceedings of the National Academy of Sciences of the United States of America.
[4]
Michael R. Anderberg,et al.
Cluster Analysis for Applications
,
1973
.
[5]
J. Barker,et al.
Large-scale temporal gene expression mapping of central nervous system development.
,
1998,
Proceedings of the National Academy of Sciences of the United States of America.
[6]
Ron Shamir,et al.
Clustering Gene Expression Patterns
,
1999,
J. Comput. Biol..
[7]
Anil K. Jain,et al.
Algorithms for Clustering Data
,
1988
.