Aggregation based on intervals as similarity measure for hierarchical clustering

Hierarchical clustering algorithms create groups of objects in a data set based on their similarity. This similarity is commonly measured by a distance function, which makes the resulting groups dependent on the distance function used and also the technique employed to merge the clusters. In this work, we propose to transform the objects to a representation where each variable is defined by an interval. Using this representation, we define a new method that measures the similarity of the objects variable by variable based on the overlapping of their intervals, instead of using a distance function. The results obtained for each variable are later passed to an aggregation function, which allows comparing the similarity between the pairs of clusters. An example of algorithm is proposed in this work using a binary function to check the overlapping and aggregating its results with the average function. Finally, the method is compared with other approaches by means of two examples.