Continuum regression for cross-modal multimedia retrieval

Understanding the relationship among different modalities is a challenging task. The frequently used canonical correlation analysis (CCA) and its variants have proved effective for building a common space in which the correlation between different modalities is maximized. In this paper, we show that CCA and its variants may cause information dissipation when switching the modals, and thus propose to use the continuum regression (CR) model to handle this problem. In particular, the CR model with a fixed variance coefficient of 1/2 is adopted here. We also apply the multinomial logistic regression model for further classification task. To evaluate the CR model, we perform a series of cross-modal retrieval experiments in terms of two kinds of modals, namely image and text. Compared with previous methods, experimental results show that the CR model has achieved the best retrieval precision, which demonstrates the potential of our method for real internet search applications.