Methods of Hierarchical Clustering

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.

[1]  C. Benito Annual Review of Information Science and Technology (ARIST) , 2003 .

[2]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  J. Benzecri L'analyse des données@@@L'analyse des donnees , 1975 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[6]  Huan Liu,et al.  Merging Distance and Density Based Clustering , 2001 .

[7]  Fionn Murtagh,et al.  The Haar Wavelet Transform of a Dendrogram , 2006, J. Classif..

[8]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[9]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[10]  Huan Liu,et al.  '1+1>2': merging distance and density based clustering , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[11]  Fionn Murtagh,et al.  Fast Hierarchical Clustering from the Baire Distance , 2010 .

[12]  Andreas Rauber,et al.  Uncovering hierarchical structure in data using the growing hierarchical self-organizing map , 2002, Neurocomputing.

[13]  Salvatore T. March,et al.  Techniques for Structuring Database Records , 1983, CSUR.

[14]  Michel Bruynooghe,et al.  Méthodes nouvelles en classification automatique de données taxinomiques nombreuses , 1977 .

[15]  Peter Willett,et al.  Hierarchic Agglomerative Clustering Methods for Automatic Document Classification , 1984, J. Documentation.

[16]  Ronald L. Graham,et al.  On the History of the Minimum Spanning Tree Problem , 1985, Annals of the History of Computing.

[17]  Risto Mukkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1990 .

[18]  Pedro Albornoz,et al.  Search and retrieval in massive data collections , 2010 .

[19]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[20]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[21]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[22]  Charles E. Heckler,et al.  Correspondence Analysis and Data Coding With Java and R , 2007, Technometrics.

[23]  Edie M. Rasmussen,et al.  Efficiency of Hierarchic Agglomerative Clustering using the ICL Distributed array Processor , 1989, J. Documentation.

[24]  A. Vellido,et al.  Review of Hierarchical Models for Data Clustering and Visualization , 2004 .

[25]  Fionn Murtagh,et al.  Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding , 2008, SIAM J. Sci. Comput..

[26]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[27]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[28]  Won Suk Lee,et al.  Statistical grid-based clustering over data streams , 2004, SGMD.

[29]  Erich Schikuta,et al.  Grid-clustering: an efficient hierarchical clustering method for very large data sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[30]  B. L. Roux,et al.  Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis , 2004 .

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  F. Murtagh Symmetry in data mining and analysis: A unifying view based on hierarchy , 2008, 0805.2744.

[33]  Jouko Lampinen,et al.  Clustering properties of hierarchical self-organizing maps , 1992, Journal of Mathematical Imaging and Vision.

[34]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[35]  Sun-Yuan Kung,et al.  Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization , 2000, IEEE Trans. Neural Networks Learn. Syst..

[36]  C. de Rham,et al.  La classification hiérarchique ascendante selon la méthode des voisins réciproques , 1980 .

[37]  I. Nabney,et al.  Constructing localized non-linear projection manifolds in a principled way:hierarchical generative topographic mapping , 2000 .

[38]  Jae-Woo Chang,et al.  A new cell-based clustering method for large, high-dimensional data in data mining applications , 2002, SAC '02.

[39]  J. Juan Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques , 1982 .

[40]  Douglas R. Morgan,et al.  Bayesian inference for model-based vision , 1992, Other Conferences.

[41]  K. McCain,et al.  Visualization of Literatures. , 1997 .

[42]  Peter Grabusts,et al.  Using grid-clustering methods in data classification , 2002, Proceedings. International Conference on Parallel Computing in Electrical Engineering.

[43]  Melvin F. Janowitz,et al.  Ordinal and Relational Clustering , 2010, Interdisciplinary Mathematical Sciences.

[44]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[45]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[46]  John Bradshaw,et al.  Similarity and Dissimilarity Methods for Processing Chemical Structure Databases , 1998, Comput. J..

[47]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[48]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[49]  Chi-Hoon Lee,et al.  Clustering spatial data in the presence of obstacles: a density-based approach , 2002, Proceedings International Database Engineering and Applications Symposium.

[50]  Classification et analyse textuelle : l'approche topologique , 2007 .

[51]  F. Murtagh,et al.  The Kohonen self-organizing map method: An assessment , 1995 .

[52]  R K Blashfield,et al.  The Literature On Cluster Analysis. , 1978, Multivariate behavioral research.

[53]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[54]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[55]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[56]  Li Wang,et al.  CUBN: A clustering algorithm based on density and distance , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[57]  Adrian E. Raftery,et al.  Bayesian inference for multiband image segmentation via model-based cluster trees , 2005, Image Vis. Comput..

[58]  Masahiro Ueno,et al.  A Clustering Method Using Hierarchical Self-Organizing Maps , 2002, J. VLSI Signal Process..

[59]  A. D. Gordon A Review of Hierarchical Classification , 1987 .