Dendrogram Seriation Using Simulated Annealing

Seriation is the ordering of the leaves of a dendrogram, such that leaves representing similar items are placed near each other according to some metric, within the constraints of the cluster tree. Such ordering greatly aids the interpretation of the relations represented by the dendrogram and reduces visual misinterpretation caused by unrelated items from different sub-trees being placed near each other during random ordering. The seriation method presented here uses simulated annealing to find an approximately optimal dendrogram ordering by minimizing a penalty function. The method employs a ‘similarity weighted distance’ penalty function that tends to avoid artifacts introduced by the traveling salesman problem algorithms commonly used for dendrogram seriation. Examples are given showing the effectiveness of the method in presenting dendrograms of the structure of a social network, and additional examples show an application for interpreting the structure of a network of journal papers covering the subject of anthrax research.

[1]  Stephen B. Deutsch,et al.  An Ordering Algorithm for Analysis of Data Arrays , 1971, Oper. Res..

[2]  Jan Karel Lenstra,et al.  Technical Note - Clustering a Data Array and the Traveling-Salesman Problem , 1974, Oper. Res..

[3]  H. Wainer,et al.  TWO ADDITIONS TO HIERARCHICAL CLUSTER ANALYSIS , 1972 .

[4]  Gary G. Yen,et al.  Time line visualization of research fronts , 2003, J. Assoc. Inf. Sci. Technol..

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gerhard J. Woeginger,et al.  The Travelling Salesman and the PQ-Tree , 1998, Math. Oper. Res..

[7]  David S. Wishart,et al.  Clustan Graphics3 Interactive Graphics for Cluster Analysis , 1999 .

[8]  Erik D. Demaine,et al.  K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data , 2002, WABI.

[9]  Kevin W. Boyack,et al.  Domain visualization using VxInsight® for science and technology management , 2002, J. Assoc. Inf. Sci. Technol..

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Richard L. Degerman Ordered binary trees constructed through an application of Kendall's tau , 1982 .

[12]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[13]  Ralf Der,et al.  Efficient State-Space Representation by Neural Maps for Reinforcement Learning , 1999 .

[14]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[15]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[16]  Erik D. Demaine,et al.  K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data , 2002, WABI.

[17]  James C. Brower,et al.  Sedation of an original data matrix as applied to paleoecology , 1988 .

[18]  M. Forina,et al.  Clustering with dendrograms on interpretation variables , 2002 .

[19]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  Charles V. Packer,et al.  Applying row-column permutation to matrix representations of large citation networks , 1989, Inf. Process. Manag..

[22]  Gerhard J. Woeginger,et al.  The Travelling Salesman and the PQ-Tree , 1996, IPCO.

[23]  William C. Halperin,et al.  Unclassed matrix shading and optimal ordering in hierarchical cluster analysis , 1984 .