MMDF-LDA: An improved Multi-Modal Latent Dirichlet Allocation model for social image annotation

Abstract Social image annotation, which aims at inferring a set of semantic concepts for a social image, is an effective and straightforward way to facilitate social image search. Conventional approaches mainly demonstrated on adopting the visual features and tags, without considering other types of metadata. How to enhance the accuracy of social image annotation by fully exploiting multi-modal features is still an opening and challenging problem. In this paper, we propose an improved Multi-Modal Data Fusion based Latent Dirichlet Allocation (LDA) topic model (MMDF-LDA) to annotate social images via fusing visual content, user-supplied tags, user comments, and geographic information. When MMDF-LDA samples annotations for one data modality, all the other data modalities are exploited. In MMDF-LDA, geographical topics are generated from GPS locations of social images, and annotations have different probability to be used in different geographical regions. A social image is divided into several patches in advance, and then MMDF-LDA assigns annotations for the patches of social images by estimating the probability of annotation-patch assignment. Through experiments in social image annotation and retrieval on several datasets, we demonstrate the effectiveness of the proposed MMDF-LDA model in comparison with state-of-the-art methods.

[1]  Li Zhuo,et al.  Creating descriptive visual words for tag ranking of compressed social image , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  Qi Tian,et al.  Image Annotation by Latent Community Detection and Multikernel Learning , 2015, IEEE Transactions on Image Processing.

[3]  Bo Wang,et al.  Multi-Instance Multi-Label Learning Combining Hierarchical Context and its Application to Image Annotation , 2016, IEEE Transactions on Multimedia.

[4]  Lawrence J. Mazlack,et al.  Multi-modal Data Fusion: A Description , 2004, KES.

[5]  Jing Liu,et al.  Projective Matrix Factorization with unified embedding for social image tagging , 2014, Comput. Vis. Image Underst..

[6]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Qi Tian,et al.  Inferring Emotional Tags From Social Images With User Demographics , 2017, IEEE Transactions on Multimedia.

[8]  Wei-Ying Ma,et al.  A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva , 2005, ICCV.

[9]  Yonghong Tian,et al.  CNN vs. SIFT for Image Retrieval: Alternative or Complementary? , 2016, ACM Multimedia.

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[12]  Allan Hanbury,et al.  A survey of methods for image annotation , 2008, J. Vis. Lang. Comput..

[13]  Dong Xu,et al.  Where your photo is taken: Geolocation prediction for social images , 2014, J. Assoc. Inf. Sci. Technol..

[14]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[15]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Qian Du,et al.  Multi-Modal and Multi-Temporal Data Fusion: Outcome of the 2012 GRSS Data Fusion Contest , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[17]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[18]  Qi Tian,et al.  Accurate Image Search with Multi-Scale Contextual Evidences , 2016, International Journal of Computer Vision.

[19]  Dong Liu,et al.  Social Diffusion Analysis With Common-Interest Model for Image Annotation , 2016, IEEE Transactions on Multimedia.

[20]  Lei Zhang,et al.  Multi-label sparse coding for automatic image annotation , 2009, CVPR.

[21]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Jialie Shen,et al.  The effects of multiple query evidences on social image retrieval , 2014, Multimedia Systems.

[23]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Qian Du,et al.  Multi-Modal Change Detection, Application to the Detection of Flooded Areas: Outcome of the 2009–2010 Data Fusion Contest , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[25]  Jordan L. Boyd-Graber,et al.  Interactive topic modeling , 2014, ACL.

[26]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Miran Pobar,et al.  Two-tier image annotation model based on a multi-label classifier and fuzzy-knowledge representation scheme , 2016, Pattern Recognit..

[28]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[31]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[32]  Yuan Yan Tang,et al.  Social Image Tagging With Diverse Semantics , 2014, IEEE Transactions on Cybernetics.

[33]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[34]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[35]  Stéphane Herbin,et al.  Semantic hierarchies for image annotation: A survey , 2012, Pattern Recognit..

[36]  Hichem Sahbi,et al.  Nonlinear Deep Kernel Learning for Image Annotation , 2017, IEEE Transactions on Image Processing.

[37]  Richa Singh,et al.  Robust memory-efficient data level information fusion of multi-modal biometric images , 2007, Inf. Fusion.

[38]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[39]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[40]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[41]  Xin Chen,et al.  Fusion of Infrared and Range Data: Multi-modal Face Images , 2006, ICB.

[42]  Hagai Attias,et al.  Supervised topic model for automatic image annotation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[44]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[45]  Zhi-Hua Zhou,et al.  Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA , 2013, IJCAI.

[46]  Xiaojin Zhu,et al.  Latent Dirichlet Allocation with Topic-in-Set Knowledge , 2009, HLT-NAACL 2009.

[47]  Bing He,et al.  Modeling topic and community structure in social tagging: The TTR-LDA-Community model , 2011, J. Assoc. Inf. Sci. Technol..

[48]  Hugo Jair Escalante,et al.  Local and global approaches for unsupervised image annotation , 2017, Multimedia Tools and Applications.

[49]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[50]  Xiaochun Cao,et al.  SLED: Semantic Label Embedding Dictionary Representation for Multilabel Image Annotation , 2015, IEEE Transactions on Image Processing.

[51]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Takehito Utsuro,et al.  Labeling Blog Posts with Wikipedia Entries through LDA-Based Topic Modeling of Wikipedia , 2013 .

[53]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[54]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[55]  Hua Xu,et al.  Constrained LDA for Grouping Product Features in Opinion Mining , 2011, PAKDD.

[56]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[57]  Huimin Yu,et al.  Probabilistic hypergraph based hash codes for social image search , 2014, Journal of Zhejiang University SCIENCE C.

[58]  David Zhang,et al.  Multi-Label Dictionary Learning for Image Annotation , 2016, IEEE Transactions on Image Processing.

[59]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[60]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[61]  Qingming Huang,et al.  Location-Based Parallel Tag Completion for Geo-Tagged Social Image Retrieval , 2017, ACM Trans. Intell. Syst. Technol..

[62]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[63]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Xueming Qian,et al.  Tag-Based Image Search by Social Re-ranking , 2016, IEEE Transactions on Multimedia.

[66]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[67]  Qi Tian,et al.  Personalized Social Image Recommendation Method Based on User-Image-Tag Model , 2017, IEEE Transactions on Multimedia.

[68]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[69]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[70]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Phani Chavali,et al.  Hierarchical particle filtering for multi-modal data fusion with application to multiple-target tracking , 2014, Signal Process..

[72]  Hossein Nezamabadi-pour,et al.  A multi-expert based framework for automatic image annotation , 2017, Pattern Recognit..

[73]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[74]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[76]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[77]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  C. V. Jawahar,et al.  Image Annotation by Propagating Labels from Semantic Neighbourhoods , 2016, International Journal of Computer Vision.

[79]  Costas Xydeas,et al.  Classification of shape for content retrieval of images in a multimedia database , 1991 .

[80]  Jianping Fan,et al.  Automatic Abstract Tag Detection for Social Image Tag Refinement and Enrichment , 2014, J. Signal Process. Syst..

[81]  Xiaoming Zhang,et al.  Social image tagging using graph-based reinforcement on multi-type interrelated objects , 2013, Signal Process..

[82]  Trevor Darrell,et al.  Factorized Multi-Modal Topic Model , 2012, UAI.

[83]  Reza Zoughi,et al.  A Comprehensive Multi-Modal NDE Data Fusion Approach for Failure Assessment in Aircraft Lap-Joint Mimics , 2013, IEEE Transactions on Instrumentation and Measurement.

[84]  Bir Bhanu,et al.  Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[85]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[86]  Mingsheng Liao,et al.  Nonlinear Compressed Sensing-Based LDA Topic Model for Polarimetric SAR Image Classification , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[87]  Nicu Sebe,et al.  Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation , 2016, IEEE Transactions on Image Processing.

[88]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.