Hierarchical model-based clustering of relational data with aggregates

This paper proposes a propositional method for hierarchical model-based clustering of relational data. We define a new type of aggregate -- frequency aggregate, which has a vector data type and can be used to record not only the observed values but also the distribution of the values of an attribute. A hierarchical agglomerative clustering algorithm with log-likelihood distance is then applied to cluster the aggregated data tentatively, and a mixture model-based method with the EM algorithm is developed to perform a further relocation clustering, in which Bayes Information Crieterion is used to determine the optimal number of clusters.