Hierarchical Clustering with Prior Knowledge

Hierarchical clustering is a class of algorithms that seeks to build a hierarchy of clusters. It has been the dominant approach to constructing embedded classification schemes since it outputs dendrograms, which capture the hierarchical relationship among members at all levels of granularity, simultaneously. Being greedy in the algorithmic sense, a hierarchical clustering partitions data at every step solely based on a similarity / dissimilarity measure. The clustering results oftentimes depend on not only the distribution of the underlying data, but also the choice of dissimilarity measure and the clustering algorithm. In this paper, we propose a method to incorporate prior domain knowledge about entity relationship into the hierarchical clustering. Specifically, we use a distance function in ultrametric space to encode the external ontological information. We show that popular linkage-based algorithms can faithfully recover the encoded structure. Similar to some regularized machine learning techniques, we add this distance as a penalty term to the original pairwise distance to regulate the final structure of the dendrogram. As a case study, we applied this method on real data in the building of a customer behavior based product taxonomy for an Amazon service, leveraging the information from a larger Amazon-wide browse structure. The method is useful when one wants to leverage the relational information from external sources, or the data used to generate the distance matrix is noisy and sparse. Our work falls in the category of semi-supervised or constrained clustering.

[1]  Shai Ben-David,et al.  Towards Property-Based Classification of Clustering Paradigms , 2010, NIPS.

[2]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[3]  J. Hartigan Statistical theory in clustering , 1985 .

[4]  Shai Ben-David,et al.  A Characterization of Linkage-Based Hierarchical Clustering , 2016, J. Mach. Learn. Res..

[5]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[6]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[7]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[8]  Alejandro Ribeiro,et al.  Hierarchical Clustering Given Confidence Intervals of Metric Distances , 2016, IEEE Transactions on Signal Processing.

[9]  Facundo Mémoli,et al.  Department of Mathematics , 1894 .

[10]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[11]  Marina MeWi Comparing Clusterings , 2002 .

[12]  Tao Li,et al.  Semi-supervised Hierarchical Clustering , 2011, 2011 IEEE 11th International Conference on Data Mining.

[13]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[14]  Mikhail Belkin,et al.  Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering , 2015, COLT.

[15]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[16]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[19]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[20]  Haifeng Zhao,et al.  Hierarchical Agglomerative Clustering with Ordering Constraints , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[21]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[22]  Yi Liu,et al.  BoostCluster: boosting clustering by pairwise constraints , 2007, KDD '07.

[23]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[24]  Shivakumar Vaithyanathan,et al.  Model-Based Hierarchical Clustering , 2000, UAI.

[25]  Benjamin Moseley,et al.  Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search , 2017, NIPS.

[26]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[27]  Steffen Oppermann,et al.  CLUSTER EQUIVALENCE AND GRADED DERIVED EQUIVALENCE , 2010, 1003.4916.

[28]  Andreas Nürnberger,et al.  Personalized Hierarchical Clustering , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[29]  Eric Bair,et al.  Semi‐supervised clustering methods , 2013, Wiley interdisciplinary reviews. Computational statistics.

[30]  Aurko Roy,et al.  Hierarchical Clustering via Spreading Metrics , 2016, NIPS.

[31]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[32]  Facundo Mémoli,et al.  Classifying Clustering Schemes , 2010, Foundations of Computational Mathematics.

[33]  Marco Di Summa,et al.  Finding the closest ultrametric , 2015, Discret. Appl. Math..

[34]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.