Genie+OWA: Robustifying hierarchical clustering with OWA-based linkages

Abstract We investigate the application of the Ordered Weighted Averaging (OWA) data fusion operator in agglomerative hierarchical clustering. The examined setting generalises the well-known single, complete and average linkage schemes. It allows to embody expert knowledge in the cluster merge process and to provide a much wider range of possible linkages. We analyse various families of weighting functions on numerous benchmark data sets in order to assess their influence on the resulting cluster structure. Moreover, we inspect the correction for the inequality of cluster size distribution – similar to the one in the Genie algorithm. Our results demonstrate that by robustifying the procedure with the Genie correction, we can obtain a significant performance boost in terms of clustering quality. This is particularly beneficial in the case of the linkages based on the closest distances between clusters, including the single linkage and its “smoothed” counterparts. To explain this behaviour, we propose a new linkage process called three-stage OWA which yields further improvements. This way we confirm the intuition that hierarchical cluster analysis should rather take into account a few nearest neighbours of each point, instead of trying to adapt to their non-local neighbourhood.

[1]  Ya-Ju Fan,et al.  Information-theoretic feature selection with discrete k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{documen , 2014, Annals of Operations Research.

[2]  T. Calvo,et al.  Generation of weighting triangles associated with aggregation functions , 2000 .

[3]  Reynaldo Gil-García,et al.  A General Framework for Agglomerative Hierarchical Clustering Algorithms , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Hernando Ombao,et al.  The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure , 2016, Journal of Classification.

[6]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[7]  Aimin Zhou,et al.  Objective extraction via fuzzy clustering in evolutionary many-objective optimization , 2020, Inf. Sci..

[8]  G. Milligan Ultrametric hierarchical clustering algorithms , 1979 .

[9]  Slawomir T. Wierzchon,et al.  Standard and Genetic k-means Clustering Techniques in Image Segmentation , 2007, 6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM'07).

[10]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[11]  Pasi Fränti,et al.  K-means properties on six clustering benchmark datasets , 2018, Applied Intelligence.

[12]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Gang Li,et al.  Learning Choquet-Integral-Based Metrics for Semisupervised Clustering , 2011, IEEE Transactions on Fuzzy Systems.

[14]  Ronald R. Yager Intelligent control of the hierarchical agglomerative clustering process , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Justyna Majewska,et al.  Cluster-mapping procedure for tourism regions based on geostatistics and fuzzy clustering: example of Polish districts , 2019 .

[16]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[17]  Witold Pedrycz,et al.  Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study , 2010, Fuzzy Sets Syst..

[18]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[19]  Marek Gagolewski,et al.  OWA-based linkage and the genie correction for hierarchical clustering , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[20]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[21]  Anthony K. H. Tung,et al.  A general framework of hierarchical clustering and its applications , 2014, Inf. Sci..

[22]  Javier Montero,et al.  Consistency and stability in aggregation operators: An application to missing data problems , 2014, Int. J. Comput. Intell. Syst..

[23]  Efendi N. Nasibov,et al.  OWA-based linkage method in hierarchical clustering: Application on phylogenetic trees , 2011, Expert Syst. Appl..

[24]  Zeshui Xu,et al.  An overview of methods for determining OWA weights , 2005, Int. J. Intell. Syst..

[25]  Steven Orey,et al.  Convergence of weighted averages of independent random variables , 1965 .

[26]  Marek Gagolewski,et al.  Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm , 2016, Inf. Sci..

[27]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[28]  Sonja Georgievska,et al.  fMLC: fast multi-level clustering and visualization of large molecular datasets , 2018, Bioinform..

[29]  Philippe Fournier-Viger,et al.  Fast and effective cluster-based information retrieval using frequent closed itemsets , 2018, Inf. Sci..

[30]  Jun Wang,et al.  K-means clustering for efficient and robust registration of multi-view point sets , 2017, ArXiv.

[31]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[32]  Derya Birant,et al.  K-Linkage: A New Agglomerative Approach for Hierarchical Clustering , 2017 .

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Marek Gagolewski,et al.  Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach , 2016, MDAI.

[35]  Hala S. Own,et al.  Unsupervised clustering of service performance behaviors , 2018, Inf. Sci..

[36]  Gleb Beliakov,et al.  Stability of weighted penalty-based aggregation functions , 2013, Fuzzy Sets Syst..