A Binary Integer Program to Maximize the Agreement Between Partitions

This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known; however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions for problems of practical size.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[3]  Dorit S. Hochbaum,et al.  Strongly Polynomial Algorithms for the Quadratic Transportation Problem with a Fixed Number of Sources , 1994, Math. Oper. Res..

[4]  R. Light Measures of response agreement for qualitative data: Some generalizations and alternatives. , 1971 .

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[7]  Ahmed Albatineh,et al.  On Similarity Indices and Correction for Chance Agreement , 2006, J. Classif..

[8]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[9]  L. Hubert Assignment methods in combinatorial data analysis , 1986 .

[10]  A. Hoffman,et al.  The variation of the spectrum of a normal matrix , 1953 .

[11]  L. Hubert,et al.  Evaluating the conformity of sociometric measurements , 1978 .

[12]  H. Messatfa An algorithm to maximize the agreement between partitions , 1992 .

[13]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[14]  Ron Shamir,et al.  A polynomial algorithm for an integer quadratic non-separable transportation problem , 1992, Math. Program..

[15]  L. Katz,et al.  A proposed index of the conformity of one sociometric measurement to another , 1953 .