A Stochastic Treatment of Similarity

This study investigates a robust measure of similarity applicable in many domains and across many dimensions of data. Given a distance or discrepancy measure on a domain, the similarity of two values in this domain is defined as the probability that any pair of values from that domain are more different (at a larger distance) than these two values are. We discuss the motivation for this approach, its properties, and the issues that arise from it.