A flexible and robust similarity measure based on contextual probability

Arguably, analogy is one of the most important aspects of intelligent reasoning. It has been hypothesized that, given suitable background knowledge, analogy can be viewed as a logical inference process. This study follows another school of thought that argues that similarity can provide a probabilistic basis for inference and analogy. Most similarity measures (which are frequently viewed as being conceptually equivalent to distance measures) are restricted to either nominal or ordinal attributes, and some are confined to classification tasks. This paper proposes a flexible similarity measure that is task-independent and applies to both nominal and ordinal data in a conceptually uniform way. The proposed similarity measure is derived from a probability function and corresponds to the intuition that if we consider all neighborhoods around a data point, the data points closer to this point should be included in more of these neighborhoods than more distant points. Experiments we have conducted to demonstrate the usefulness of this measure indicate that it fares very competitively with commonly used similarity measures.