Likelihood Based Hierarchical Clustering and Network Topology Identification

This paper develops a new method for hierarchical clustering based on a generative dendritic cluster model. The objects are viewed as being generated through a tree structured refinement process. In certain problems, this generative model naturally captures the physical mechanisms responsible for relationships among objects, for example, in genetic studies and network topology identification. The networking problem is examined in some detail, to illustrate the new clustering method. In general, the generative model is not representative of actual physical mechanisms, but it nonetheless provides a means for dealing with errors in the similarity matrix, simultaneously promoting two desirable features in clustering: intra-class similarity and inter-class dissimilarity.

[1]  Peter Willett,et al.  Hierarchic Document Clustering Using Ward's Method. , 1986, SIGIR 1986.

[2]  Shivakumar Vaithyanathan,et al.  Model-Based Hierarchical Clustering , 2000, UAI.

[3]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[4]  R. Wolpert,et al.  Integrated likelihood methods for eliminating nuisance parameters , 1999 .

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[7]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  Roger L. Berger Comment on Perlman and Wu, “The Emperor’s new tests” (with rejoinder by authors) , 1999 .

[10]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[11]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[12]  Steven McCanne,et al.  Inference of multicast routing trees and bottleneck bandwidths using end-to-end measurements , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[13]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[14]  Peter Willett,et al.  Hierarchic document classification using Ward's clustering method , 1986, SIGIR '86.

[15]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[18]  Robert D. Nowak,et al.  Maximum likelihood network topology identification from edge-based unicast measurements , 2002, SIGMETRICS '02.

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.