论文信息 - Aggregation based on intervals as similarity measure for hierarchical clustering

Aggregation based on intervals as similarity measure for hierarchical clustering

Hierarchical clustering algorithms create groups of objects in a data set based on their similarity. This similarity is commonly measured by a distance function, which makes the resulting groups dependent on the distance function used and also the technique employed to merge the clusters. In this work, we propose to transform the objects to a representation where each variable is defined by an interval. Using this representation, we define a new method that measures the similarity of the objects variable by variable based on the overlapping of their intervals, instead of using a distance function. The results obtained for each variable are later passed to an aggregation function, which allows comparing the similarity between the pairs of clusters. An example of algorithm is proposed in this work using a binary function to check the overlapping and aggregating its results with the average function. Finally, the method is compared with other approaches by means of two examples.

Susana Montes | Noelia Rico | Pedro Huidobro | Irene Díaz

[1] I. Olkin,et al. The distance between two random vectors with given dispersion matrices , 1982 .

[2] Fionn Murtagh,et al. A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[3] Lior Rokach,et al. Data Mining And Knowledge Discovery Handbook , 2005 .

[4] Chin-Chen Chang,et al. A hybrid method for estimating the Euclidean distance between two vectors , 2002, First International Symposium on Cyber Worlds, 2002. Proceedings..

[5] Paulo Maciel,et al. Hierarchical Cluster Analysis of Interval-valued Data Using Width of Range Euclidean Distance , 2019, 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[6] S. C. Johnson. Hierarchical clustering schemes , 1967, Psychometrika.

[7] Ana Belén Ramos-Guajardo,et al. A hierarchical clustering method for random intervals based on a similarity measure , 2021, Comput. Stat..

[8] Rolf Lustig,et al. Angle-average for the powers of the distance between two separated vectors , 1988 .

[9] Lior Rokach,et al. Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.