Multi-Output Regression with Tag Correlation Analysis for Effective Image Tagging

Automatic image tagging is one of the most important research topics in multimedia. How to achieve accurate image tagging to bridge the semantic gap between images’ content and users’ semantic understanding has been widely studied in the last decade. One common approach is to convert image tagging to a multi-task learning problem. However, most existing methods ignore tag correlations in the learning process. In this paper, we show the importance of tag correlations in conducting multi-task learning. We formulate image tagging as a multi-output regression problem accounting for tag correlations, which are captured by the covariance matrix of the regression coefficients and the noise across all tags respectively. The combination of multi-output regression with tag correlation analysis takes advantage of the latent dependencies among tags to overcome limitations of existing work. Extensive experiments have been conducted on two benchmark datasets, and the results confirm that our approach outperforms the state-of-the-art methods.

[1]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[2]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[3]  Qian Zhang,et al.  Random Forest for Image Annotation , 2012, ECCV.

[4]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[5]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[6]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[7]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[8]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[9]  Jianmin Wang,et al.  Automatic image annotation using tag-related random search over visual neighbors , 2012, CIKM.

[10]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[11]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[15]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[16]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[18]  Hal Daumé,et al.  Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression , 2012, NIPS.

[19]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[20]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[21]  Dong Liu,et al.  Unified tag analysis with multi-edge graph , 2010, ACM Multimedia.

[22]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[23]  Yueting Zhuang,et al.  Annotating web images using NOVA: NOn-conVex group spArsity , 2012, ACM Multimedia.

[24]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.