Multimedia venue semantic modeling based on multimodal data

Abstract A huge amount of text and multimedia (images and videos) data concerning venues is constantly being generated. To model the semantics of these venues, it is essential to analyze both text and multimedia user-generated content (UGC) in an integral manner. This task, however, is difficult for location-based social networks (LBSNs) because their text and multimedia UGCs tend to be uncorrelated. In this paper, we propose a novel multimedia location topic modeling approach to address this problem. We first utilize Recurrent Convolutional Networks to build the correlation between multimedia UGCs and text. Then, a graph model is structured according to these correlations. Next, we employ a graph clustering method to detect the latent multimedia topics for each venue. Based on the obtained venue semantics, we propose techniques to model multimedia location topics and perform semantic-based location summarization, venue prediction and image description. Extensive experiments are conducted on a cross-platform dataset, and the promising results demonstrate the superiority of the proposed method.

[1]  Henry Anaya-Sánchez,et al.  A document clustering algorithm for discovering and describing topics , 2010, Pattern Recognit. Lett..

[2]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[3]  Geert-Jan Houben,et al.  Placing images on the world map: a microblog-based enrichment approach , 2012, SIGIR '12.

[4]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[5]  Rujie Liu,et al.  Multi-SVM Multi-instance Learning for Object-Based Image Retrieval , 2013, CAIP.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  David J. Kriegman,et al.  Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yi-Liang Zhao,et al.  Generating Representative Views of Landmarks via Scenic Theme Detection , 2011, MMM.

[9]  Xiangyu Wang,et al.  Semantic-Based Location Recommendation With Multimodal Venue Semantics , 2015, IEEE Transactions on Multimedia.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Mihaela van der Schaar,et al.  Contextual Online Learning for Multimedia Content Aggregation , 2015, IEEE Transactions on Multimedia.

[13]  Georg Gartner,et al.  Applications of location–based services: a selected review , 2007, J. Locat. Based Serv..

[14]  Prasenjit Mitra,et al.  Event detection with spatial latent Dirichlet allocation , 2011, JCDL '11.

[15]  Trevor Darrell,et al.  Multimodal location estimation , 2010, ACM Multimedia.

[16]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[17]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[18]  Yang Yang,et al.  Multimedia Summarization for Social Events in Microblog Stream , 2015, IEEE Transactions on Multimedia.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Quan Wang,et al.  Regularized Latent Semantic Indexing: A New Approach to Large-Scale Topic Modeling , 2013, TOIS.

[21]  Qingming Huang,et al.  An effective multi-clue fusion approach for web video topic detection , 2012, ACM Multimedia.

[22]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Mor Naaman,et al.  Context data in geo-referenced digital photo collections , 2004, MULTIMEDIA '04.

[26]  Marcel Worring,et al.  Interactive Multimodal Learning for Venue Recommendation , 2015, IEEE Transactions on Multimedia.

[27]  Shuicheng Yan,et al.  Robust Clustering as Ensembles of Affinity Relations , 2010, NIPS.

[28]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[29]  Changsheng Xu,et al.  Paint the City Colorfully: Location Visualization from Multiple Themes , 2013, MMM.

[30]  Chengqi Zhang,et al.  KBSVM: KMeans-based SVM for Business Intelligence , 2004, AMCIS.

[31]  Dong Liu,et al.  Social Diffusion Analysis With Common-Interest Model for Image Annotation , 2016, IEEE Transactions on Multimedia.

[32]  Fei Wang,et al.  ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback , 2012, AAAI.

[33]  M. Topi,et al.  Robust texture classification by subsets of local binary patterns , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[34]  Paul S. Bradley,et al.  Clustering very large databases using EM mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Amit Belani,et al.  Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach , 2010, ArXiv.

[37]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, CVPR 2009.

[38]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[40]  Tomoharu Iwata,et al.  Geo topic model: joint modeling of user's activity area and interests for location recommendation , 2013, WSDM.

[41]  W. Eric L. Grimson,et al.  Spatial Latent Dirichlet Allocation , 2007, NIPS.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Matti Pietikäinen,et al.  Robust Texture Classification by Subsets of Local Binary Patterns , 2000, ICPR.

[44]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[45]  Trevor Darrell,et al.  Factorized Multi-Modal Topic Model , 2012, UAI.

[46]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Aixin Sun,et al.  Query-Guided Event Detection From News and Blog Streams , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[48]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.