Visual understanding by mining social media: recent advances and challenges

With the rapid increase in social websites that has dramatically increased the volume of social media, which includes the use of images and videos, visual understanding has attracted great interest in several areas such as multimedia, computer vision, and pattern recognition. Valuable auxiliary resources available on social websites, such as user-provided tags, aid in the tasks of visual understanding. Therefore, several methods have been proposed for exploring the auxiliary resources for tag refinement, image retrieval, and media summarization. This work conducts a comprehensive survey of recent advances in visual understanding by mining social media in order to discuss their merits and limitations. We then analyze the difficulties and challenges of visual understanding followed by several possible future research directions.

[1]  Gang Wang,et al.  Click-through-based Deep Visual-Semantic Embedding for Image Search , 2015, ACM Multimedia.

[2]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[3]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[4]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[5]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  汪萌,et al.  Image Annotation By Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014 .

[7]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[8]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Chunyan Miao,et al.  Online Multi-Modal Distance Metric Learning with Application to Image Retrieval , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Qi Tian,et al.  Multimedia search reranking: A literature survey , 2014, CSUR.

[11]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[12]  Ji Wan,et al.  SOML: Sparse Online Metric Learning with Application to Image Retrieval , 2014, AAAI.

[13]  Yue Gao,et al.  Image Tagging with Social Assistance , 2014, ICMR.

[14]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[15]  Stefanie Nowak,et al.  Photo summary: automated selection of representative photos from a digital collection , 2011, ICMR '11.

[16]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Dong Liu,et al.  Boost search relevance for tag-based social image retrieval , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[18]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[19]  Kun Li,et al.  Nonrigid Structure From Motion via Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[20]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[21]  Hanqing Lu,et al.  Semi-supervised multi-graph hashing for scalable similarity search , 2014, Comput. Vis. Image Underst..

[22]  Xiang Zhu,et al.  Supervised deep hashing for scalable face image retrieval , 2018, Pattern Recognit..

[23]  Petros Daras,et al.  Content-based tag propagation and tensor factorization for personalized item recommendation based on social tagging , 2014, TIIS.

[24]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[25]  Ivor W. Tsang,et al.  Textual Query of Personal Photos Facilitated by Large-Scale Web Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xiaoming Zhang,et al.  Social image tagging using graph-based reinforcement on multi-type interrelated objects , 2013, Signal Process..

[27]  Ivor W. Tsang,et al.  Using large-scale web data to facilitate textual query based retrieval of consumer photos , 2009, MM '09.

[28]  Nicu Sebe,et al.  GLocal structural feature selection with sparsity for multimedia data understanding , 2013, MM '13.

[29]  Yueting Zhuang,et al.  Hypergraph spectral hashing for similarity search of social image , 2011, MM '11.

[30]  Hung-Khoon Tan,et al.  Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.

[31]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[34]  Jianmin Wang,et al.  Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[36]  Changsheng Xu,et al.  Learn to Personalized Image Search From the Photo Sharing Websites , 2012, IEEE Transactions on Multimedia.

[37]  Jianmin Jiang,et al.  Tensor rank selection for multimedia analysis , 2015, J. Vis. Commun. Image Represent..

[38]  Zhiwu Lu,et al.  Image annotation by semantic sparse recoding of visual content , 2012, ACM Multimedia.

[39]  Zechao Li,et al.  Projective nonnegative matrix factorization for social image retrieval , 2016, Neurocomputing.

[40]  Beng Chin Ooi,et al.  Effective deep learning-based multi-modal retrieval , 2015, The VLDB Journal.

[41]  Zhiwu Lu,et al.  Learning Descriptive Visual Representation by Semantic Regularized Matrix Factorization , 2013, IJCAI.

[42]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Hsuan-Tien Lin,et al.  Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement , 2012, IEEE Transactions on Multimedia.

[44]  Alberto Del Bimbo,et al.  A Cross-media Model for Automatic Image Annotation , 2014, ICMR.

[45]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[46]  Changsheng Xu,et al.  MLRank: Multi-correlation Learning to Rank for image annotation , 2013, Pattern Recognit..

[47]  Jianmin Jiang,et al.  The Extraction of Powerful and Attractive Video Contents Based on One Class SVM , 2015, PCM.

[48]  Huan Liu,et al.  An Unsupervised Feature Selection Framework for Social Media Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Jinhui Tang,et al.  Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[50]  Jinhui Tang,et al.  Discriminative Deep Hashing for Scalable Face Image Retrieval , 2017, IJCAI.

[51]  Jinhui Tang,et al.  Robust Structured Nonnegative Matrix Factorization for Image Representation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[52]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .

[53]  Trevor Darrell,et al.  Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[54]  Gang Hua,et al.  Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Wolfgang Nejdl,et al.  An adaptive teleportation random walk model for learning social tag relevance , 2014, SIGIR.

[56]  Stevan Rudinac,et al.  Finding representative and diverse community contributed images to create visual summaries of geographic areas , 2011, MM '11.

[57]  Steven C. H. Hoi,et al.  Online multi-modal distance learning for scalable multimedia retrieval , 2013, WSDM.

[58]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[59]  Ping Zhong,et al.  Tag Refinement for User-Contributed Images via Graph Learning and Nonnegative Tensor Factorization , 2015, IEEE Signal Processing Letters.

[60]  Steven C. H. Hoi,et al.  A two-view learning approach for image tag ranking , 2011, WSDM '11.

[61]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[62]  Jing Liu,et al.  Structure preserving non-negative matrix factorization for dimensionality reduction , 2013, Comput. Vis. Image Underst..

[63]  Ying He,et al.  Mining social images with distance metric learning for automated image tagging , 2011, WSDM '11.

[64]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Yinglin Wang,et al.  Exploring Shared Subspace and Joint Sparsity for Canonical Correlation Analysis , 2014, CIKM.

[66]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[67]  Meng Wang,et al.  Neighborhood Discriminant Hashing for Large-Scale Image Retrieval , 2015, IEEE Transactions on Image Processing.

[68]  Guang Li,et al.  Summarization-based Video Caption via Deep Neural Networks , 2015, ACM Multimedia.

[69]  Changsheng Xu,et al.  Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.

[70]  Jing Liu,et al.  Image annotation using multi-correlation probabilistic matrix factorization , 2010, ACM Multimedia.

[71]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[72]  Ingmar Weber,et al.  Personalized, interactive tag recommendation for flickr , 2008, RecSys '08.

[73]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[74]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[75]  Yihong Gong,et al.  Discovering Image Semantics in Codebook Derivative Space , 2012, IEEE Transactions on Multimedia.

[76]  Qingming Huang,et al.  Semantic-aware Hashing for Social Image Retrieval , 2015, ICMR.

[77]  Tao Mei,et al.  Image tag refinement by regularized latent Dirichlet allocation , 2013, Comput. Vis. Image Underst..

[78]  Wen Gao,et al.  Social Image Tagging by Mining Sparse Tag Patterns from Auxiliary Data , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[79]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[80]  Yao Zhao,et al.  Cross-Modal Retrieval With CNN Visual Features: A New Baseline , 2017, IEEE Transactions on Cybernetics.

[81]  Jinhui Tang,et al.  Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control , 2015, IEEE Transactions on Image Processing.

[82]  Dong Liu,et al.  Unified tag analysis with multi-edge graph , 2010, ACM Multimedia.

[83]  Jing Liu,et al.  Projective Matrix Factorization with unified embedding for social image tagging , 2014, Comput. Vis. Image Underst..

[84]  Yifan Zhang,et al.  Correlation consistency constrained probabilistic matrix factorization for social tag refinement , 2013, Neurocomputing.

[85]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[86]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.

[87]  Jing Liu,et al.  Sparse constraint nearest neighbour selection in cross-media retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[88]  Subhransu Maji,et al.  Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.

[89]  Meng Wang,et al.  Spectral Hashing With Semantically Consistent Graph for Image Indexing , 2013, IEEE Transactions on Multimedia.

[90]  Nenghai Yu,et al.  Distance metric learning from uncertain side information with application to automated photo tagging , 2009, ACM Multimedia.

[91]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[92]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[93]  Jiebo Luo,et al.  Photo Stream Alignment and Summarization for Collaborative Photo Collection and Sharing , 2012, IEEE Transactions on Multimedia.

[94]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[96]  Marcel Worring,et al.  Personalizing automated image annotation using cross-entropy , 2011, ACM Multimedia.

[97]  Changsheng Xu,et al.  Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[98]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[99]  Jinhui Tang,et al.  Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[100]  C. V. Jawahar,et al.  Multi modal semantic indexing for image retrieval , 2010, CIVR '10.

[101]  Jing Liu,et al.  Multimedia News Summarization in Search , 2016, ACM Trans. Intell. Syst. Technol..

[102]  Abderrahim Sekkaki,et al.  Soccer Video Summarization Using Video Content Analysis and Social Media Streams , 2014, 2014 IEEE/ACM International Symposium on Big Data Computing.

[103]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[104]  Jing Liu,et al.  Nonlinear matrix factorization with unified embedding for social tag relevance learning , 2013, Neurocomputing.

[105]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[106]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[107]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[108]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[109]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[110]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Qiuqi Ruan,et al.  Sparse feature selection based on L2, 1/2-matrix norm for web image annotation , 2015, Neurocomputing.

[112]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[113]  Zhiwu Lu,et al.  Direct Semantic Analysis for Social Image Classification , 2014, AAAI.

[114]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[115]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[116]  Chong-Wah Ngo,et al.  Click-through-based cross-view learning for image search , 2014, SIGIR.

[117]  Rong Jin,et al.  Image Tag Completion by Noisy Matrix Recovery , 2014, ECCV.

[118]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[119]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[120]  Sourav S. Bhowmick,et al.  Social image tag recommendation by concept matching , 2011, ACM Multimedia.

[121]  Jinhui Tang,et al.  Weakly-Supervised Deep Nonnegative Low-Rank Model for Social Image Tag Refinement and Assignment , 2017, AAAI.

[122]  Qi Tian,et al.  Social Embedding Image Distance Learning , 2014, ACM Multimedia.

[123]  Yuan Yan Tang,et al.  Social Image Tagging With Diverse Semantics , 2014, IEEE Transactions on Cybernetics.

[124]  Jianmin Jiang,et al.  Semi-supervised feature selection via hierarchical regression for web image classification , 2014, Multimedia Systems.

[125]  Jing Liu,et al.  Personalized Geo-Specific Tag Recommendation for Photos on Social Websites , 2014, IEEE Transactions on Multimedia.

[126]  Lie Lu,et al.  Optimization-based automated home video editing system , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[127]  Jing Liu,et al.  Low rank metric learning for social image retrieval , 2012, ACM Multimedia.

[128]  J R. Suneetha,et al.  A Locality Sensitive Low-Rank Model for Image Tag Completion , 2018 .

[129]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[130]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[131]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.