Pairwise Data Clustering by Deterministic Annealing

Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness properties of maximum entropy inference. The resulting Gibbs probability distributions are estimated by mean-field approximation. A new structure-preserving algorithm to cluster dissimilarity data and to simultaneously embed these data in a Euclidian vector space is discussed which can be used for dimensionality reduction and data visualization. The suggested embedding algorithm which outperforms conventional approaches has been implemented to analyze dissimilarity data from protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images.

[1]  R. Peierls On a Minimum Property of the Free Energy , 1938 .

[2]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[3]  Béla Julesz,et al.  Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.

[4]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[5]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[8]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[11]  C. Gardiner Handbook of Stochastic Methods , 1983 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Y. Tikochinsky,et al.  Alternative approach to maximum-entropy inference , 1984 .

[14]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[15]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[16]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[17]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[18]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[19]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[20]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[21]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[22]  Alan L. Yuille,et al.  Generalized Deformable Models, Statistical Physics, and Matching Problems , 1990, Neural Computation.

[23]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[25]  Petar D. Simic,et al.  Statistical mechanics as the underlying theory of ‘elastic’ and ‘neural’ optimisations , 1990 .

[26]  Petar D. Simic Constrained Nets for Graph Matching and Other Quadratic Assignment Problems , 1991, Neural Comput..

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Federico Girosi,et al.  Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[30]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[31]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[32]  Joachim M. Buhmann,et al.  Complexity Optimized Data Clustering by Competitive Neural Networks , 1993, Neural Computation.

[33]  Geoffrey C. Fox,et al.  Constrained Clustering as an Optimization Method , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Joachim M. Buhmann,et al.  Vector quantization with complexity costs , 1993, IEEE Trans. Inf. Theory.

[35]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[36]  Joachim M. Buhmann,et al.  Inferring Hierarchical Clustering Structures by Deterministic Annealing , 1996, KDD.

[37]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Joachim M. Buhmann,et al.  Unsupervised segmentation of textured images by pairwise data clustering , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.