Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings

Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Basilios Gatos,et al.  ICFHR 2012 Competition on Writer Identification Challenge 1: Latin/Greek Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[4]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  A. Papandreou,et al.  ICDAR 2013 Competition on Writer Identification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Marcus Liwicki,et al.  Sparse radial sampling LBP for writer identification , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[9]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Haikal El Abed,et al.  ICDAR2015 competition on Multi-script Writer Identification and Gender Classification using ‘QUWI’ Database , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[11]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).