Distances and Other Dissimilarity Measures in Chemometrics

Several similarity/diversity measures for data mining and chemometrics are presented and discussed toward the different data they are applied to. After a short presentation of the axioms for dissimilarity and similarity functions, their relationships, and the required data pretreatment, the theoretical definitions and formulas of distance and similarity measures for real-valued, binary, ranked, frequency, and mixed-type data are provided along with the main concepts on distances between sets and meta-distances. Simple examples of calculation are given, and extended comparisons are performed on the distances defined for real-valued and binary data. Keywords: chemometrics; distance measures; similarity measures; data mining; axioms of distances; meta-distances; binary similarity coefficients