A permutation-based algorithm for block clustering

Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.

[1]  W. T. Williams,et al.  Multivariate Methods in Plant Ecology: IV. Nodal Analysis , 1962 .

[2]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[3]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[4]  Stephen B. Deutsch,et al.  An Ordering Algorithm for Analysis of Data Arrays , 1971, Oper. Res..

[5]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[6]  Lawrence Hubert,et al.  Problems of seriation using a subject by item response matrix. , 1974 .

[7]  M. Hill Correspondence Analysis: A Neglected Multivariate Method , 1974 .

[8]  Brian Everitt,et al.  Cluster analysis , 1974 .

[9]  P. Arabie,et al.  An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling , 1975 .

[10]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[11]  J. M. Norman,et al.  A Dynamic Programming Formulation with Diverse Applications , 1976 .

[12]  J. A. Hartigan,et al.  Modal Blocks in Dentition of West Coast Mammals , 1976 .

[13]  Phipps Arabie,et al.  Constructing blockmodels: How and why , 1978 .

[14]  P. Diaconis,et al.  Generating a random permutation with random transpositions , 1981 .

[15]  Leo A. Goodman,et al.  Criteria for Determining Whether Certain Categories in a Cross-Classification Table Should Be Combined, with Special Reference to Occupational Categories in an Occupational Mobility Table , 1981, American Journal of Sociology.

[16]  Reginald G. Golledge,et al.  Matrix reorganization and dynamic programming: Applications to paired comparisons and unidimensional seriation , 1981 .

[17]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[18]  Willem J. Heiser,et al.  Analyzing rectangular tables by joint and constrained multidimensional scaling , 1983 .

[19]  George W. Furnas,et al.  The estimation of ultrametric and path length trees from rectangular proximity data , 1984 .

[20]  Cyrus R. Mehta,et al.  Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables , 1985 .

[21]  Zvi Gilula,et al.  Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach , 1986 .

[22]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[23]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[24]  D. Aldous On the Markov Chain Simulation Method for Uniform Combinatorial Distributions and Simulated Annealing , 1987, Probability in the Engineering and Informational Sciences.

[25]  G. Y. Wong,et al.  Bayesian Models for Directed Graphs , 1987 .

[26]  Michael Greenacre,et al.  Clustering the rows and columns of a contingency table , 1988 .