On the Relationship between the Posterior and Optimal Similarity

For a classification problem described by the joint density $P(\omega,x)$, models of $P(\omega\eq\omega'|x,x')$ (the ``Bayesian similarity measure'') have been shown to be an optimal similarity measure for nearest neighbor classification. This paper analyzes demonstrates several additional properties of that conditional distribution. The paper first shows that we can reconstruct, up to class labels, the class posterior distribution $P(\omega|x)$ given $P(\omega\eq\omega'|x,x')$, gives a procedure for recovering the class labels, and gives an asymptotically Bayes-optimal classification procedure. It also shows, given such an optimal similarity measure, how to construct a classifier that outperforms the nearest neighbor classifier and achieves Bayes-optimal classification rates. The paper then analyzes Bayesian similarity in a framework where a classifier faces a number of related classification tasks (multitask learning) and illustrates that reconstruction of the class posterior distribution is not possible in general. Finally, the paper identifies a distinct class of classification problems using $P(\omega\eq\omega'|x,x')$ and shows that using $P(\omega\eq\omega'|x,x')$ to solve those problems is the Bayes optimal solution.

[1]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[2]  Thomas M. Breuel Character recognition by adaptive statistical similarity , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  Jonathan Baxter,et al.  The Canonical Distortion Measure for Vector Quantization and Function Approximation , 1997, ICML.

[5]  Thomas M. Breuel,et al.  Classification by probabilistic clustering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Boi Faltings Probabilistic Indexing for Case-Based Prediction , 1997, ICCBR.

[7]  Thomas M. Breuel,et al.  Classification Using a Hierarchical Bayesian Approach , 2002, ICPR.

[8]  Martial Hebert, Co-chair , 2002 .

[9]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[10]  Martial Hebert,et al.  The optimal distance measure for object detection , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..