Automatic estimation the number of clusters in hierarchical data clustering

Emergent pattern recognition is crucially needed for a real-time monitoring network to recognize emerging behavior of a physical system from sensor measurement data. To achieve effective emergent pattern recognition, one of the challenging problems is to determine the number of data clusters automatically. This paper studies the performance of the model-based clustering approach and using the knee of an evaluation graph for the estimation of the number of clusters. The working principle of these two methods is presented in the article. Both methods have been used for the classification of damage patterns for a benchmark civil structure. The performance of these two methods on determining the number of clusters and classification success rate is discussed.

[1]  Pasi Fränti,et al.  Knee Point Detection in BIC for Detecting the Number of Clusters , 2008, ACIVS.

[2]  Hannu Toivonen,et al.  Estimating the number of segments in time series data using permutation tests , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Bo Chen,et al.  Unsupervised Structure Damage Classification Based on the Data Clustering and Artificial Immune Pattern Recognition , 2009, ICARIS.

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[6]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[7]  Hakbae Lee,et al.  Determining the number of clusters in cluster analysis , 2008 .

[8]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[9]  Adrian E. Raftery,et al.  MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering † , 2007 .

[10]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[11]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[12]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[13]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[14]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[15]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[16]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[17]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[18]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[19]  N K Jerne,et al.  Towards a network theory of the immune system. , 1973, Annales d'immunologie.

[20]  Jerne Nk Towards a network theory of the immune system. , 1974 .

[21]  Bo Chen,et al.  Discovery of emerging patterns with immune network theory , 2010, Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring.

[22]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[23]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.