Personalized Image Annotation Using Deep Architecture

In image-centric social networks, such as Instagram and Pinterest, users tend to share photos with several tags. These tags describe the content of the image or provide additional contextual information, and therefore may not be necessarily tied to image content and usually carry personal preference. Annotating images in social networks in a personalized manner is in demand. However, the existing image annotation models, which rely only on image content information, cannot capture the user’s tagging preference. In this paper, we propose a deep architecture for personalized image annotation by leveraging the wealth of information in user’s tagging history. The proposed architecture consists of three components: two components for learning features of the image content and user’s history tags and the other one for combining the two learned features to predict the tags. We also explore two ways to model user’s history tags: 1) simply average the embeddings of user’s history tags and 2) model user’s history tags with a sequence model by long short-term memory recurrent neural network. We evaluate our proposed deep architecture on a large-scale and realistic data set, consisting of ~22.8 million public images uploaded by ~4.69 million users. Experimental results show that our proposed deep architecture is effective on a personalized image annotation task.

[1]  Chun Chen,et al.  Personalized automatic image annotation based on reinforcement learning , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[4]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[5]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[6]  James Ze Wang,et al.  Quest for relevant tags using local interaction networks and visual content , 2010, MIR '10.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Emily Denton,et al.  User Conditional Hashtag Prediction for Images , 2015, KDD.

[9]  Tao Mei,et al.  Relaxing from Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Marcel Worring,et al.  Personalizing automated image annotation using cross-entropy , 2011, ACM Multimedia.

[11]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[12]  Mohan S. Kankanhalli,et al.  ConTagNet: Exploiting User Context for Image Tag Recommendation , 2016, ACM Multimedia.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Lamberto Ballan,et al.  Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Jiebo Luo,et al.  Using Geotags to Derive Rich Tag-Clouds for Image Annotation , 2011, Social Media Modeling and Computing.

[16]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[17]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[18]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Csaba Szepesvári,et al.  Deep Representations and Codes for Image Auto-Annotation , 2012, NIPS.

[20]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[25]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jure Leskovec,et al.  Image Labeling on a Network: Using Social-Network Metadata for Image Classification , 2012, ECCV.

[27]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[28]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[29]  Gunhee Kim,et al.  Attend to You: Personalized Image Captioning with Context Sequence Memory Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[31]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.