论文信息 - Hierarchical model-based clustering of relational data with aggregates

Hierarchical model-based clustering of relational data with aggregates

This paper proposes a propositional method for hierarchical model-based clustering of relational data. We define a new type of aggregate -- frequency aggregate, which has a vector data type and can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively, and a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Crieterion is used to determine the optimal number of clusters.

Sally I. McClean | Kenneth Adamson | Mary Shapcott | Jianzhong Chen

[1] Dietrich Wettschereck,et al. Relational Instance-Based Learning , 1996, ICML.

[2] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[3] Lise Getoor,et al. Learning Probabilistic Relational Models , 1999, IJCAI.

[4] Brian Everitt,et al. Cluster analysis , 1974 .

[5] Marina Meila,et al. An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[6] Saso Dzeroski,et al. Multi-relational data mining 2004: workshop report , 2004, SKDD.