Tensor Product of Correlated Textual and Visual Features: A Quantum Theory Inspired Image Retrieval Framework

In multimedia information retrieval, where a document may contain textual and visual content features, the ranking of documents is often computed by heuristically combining the feature spaces of different media types or combining the ranking scores computed independently from different feature spaces. In this paper, we propose a principled approach inspired by Quantum Theory. Specifically, we propose a tensor product based model aiming to represent text and visual content features of an image as a non-separable composite system. The ranking scores of the images are then computed in the form of a quantum measurement. In addition, the correlations between features of different media types are incorporated in the framework. Experiments on ImageClef2007 show a promising performance of the tensor based approach.