Clustering for contingency tables: boxes and partitions

The correspondence analysis (CA) method appears to be an effective tool for analysis of interrelations between rows and columns in two-way contingency data. A discrete version of the method, box clustering, is developed in the paper using an approximation version of the CA model extended to the case when CA factor values are required to be Boolean. Several properties of the proposed SEFIT-BOX algorithm are proved to facilitate interpretation of its output. It is also shown that two known partitioning algorithms (applied within row or column sets only) could be considered as locally optimal algorithms for fitting the model, and extensions of these algorithms to a simultaneous row and column partitioning problem are proposed.

[1]  Louis Guttman,et al.  Measurement as structural theory , 1971 .

[2]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[3]  R. Sokal,et al.  Optimisation en Classification Automatique. , 1984 .

[4]  Michael Greenacre,et al.  Clustering the rows and columns of a contingency table , 1988 .

[5]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[6]  M. Volle Analyse des données , 1978 .

[7]  Boris G. Mirkin,et al.  Approximation of Association Data by Structures and Clusters , 1993, Quadratic Assignment and Related Problems.

[8]  F. Marcotorchino,et al.  Block seriation problems: A unified approach. Reply to the problem of H. Garcia and J. M. Proth (Applied Stochastic Models and Data Analysis, 1, (1), 25–34 (1985)) , 1987 .

[9]  Ludovic Lebart,et al.  Statistique et informatique appliquées , 1973 .

[10]  H. T. Reynolds,et al.  The analysis of cross-classifications , 1977 .

[11]  Gérard Govaert La classification croisée , 1989, Monde des Util. Anal. Données.

[12]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[13]  B. G. Mirkin,et al.  Correspondence analysis and classification , 1993 .

[14]  B. Mirkin A sequential fitting procedure for linear data analysis models , 1990 .

[15]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[16]  W. DeSarbo Gennclus: New models for general nonhierarchical clustering analysis , 1982 .

[17]  Leo A. Goodman,et al.  Measures, Models, and Graphical Displays in the Analysis of Cross-Classified Data , 1991 .