Text Mining in Multimedia

A large amount of multimedia data (e.g., image and video) is now available on the Web. A multimedia entity does not appear in isolation, but is accompanied by various forms of metadata, such as surrounding text, user tags, ratings, and comments etc. Mining these textual metadata has been found to be effective in facilitating multimedia information processing and management. A wealth of research efforts has been dedicated to text mining in multimedia. This chapter provides a comprehensive survey of recent research efforts. Specifically, the survey focuses on four aspects: (a) surrounding text mining; (b) tag mining; (c) joint text and visual content mining; and (d) cross text and visual content mining. Furthermore, open research issues are identified based on the current research efforts.

[1]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[2]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.

[3]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[4]  Tat-Seng Chua,et al.  A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[5]  Wei-Ying Ma,et al.  IGroup: web image search results clustering , 2006, MM '06.

[6]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[8]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[9]  Wei-Ying Ma,et al.  A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva , 2005, ICCV.

[10]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Philip S. Yu,et al.  Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Yi-Hsuan Yang,et al.  ContextSeer: context search and recommendation at query time for shared consumer photos , 2008, ACM Multimedia.

[14]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[15]  Meng Wang,et al.  Tagging tags , 2010, ACM Multimedia.

[16]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[17]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, ICCV.

[18]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[19]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[20]  Shih-Fu Chang,et al.  Label diagnosis through self tuning for web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[22]  Charu C. Aggarwal,et al.  Towards semantic knowledge propagation from text corpus to web images , 2011, WWW.

[23]  Meng Wang,et al.  ShotTagger: tag location for internet videos , 2011, ICMR.

[24]  Bo Zhang,et al.  A unified framework for image retrieval using keyword and visual features , 2005, IEEE Transactions on Image Processing.

[25]  Shih-Fu Chang,et al.  Label diagnosis through self tuning forweb image search , 2009, CVPR.

[26]  Dong Liu,et al.  Unified tag analysis with multi-edge graph , 2010, ACM Multimedia.

[27]  Wei-Ying Ma,et al.  Iteratively clustering web images based on link and attribute reinforcements , 2005, ACM Multimedia.

[28]  Rong Yan,et al.  Multimedia Search with Pseudo-relevance Feedback , 2003, CIVR.

[29]  Dong Liu,et al.  Content-based tag processing for Internet social images , 2010, Multimedia Tools and Applications.

[30]  Shih-Fu Chang,et al.  Reranking Methods for Visual Search , 2007, IEEE MultiMedia.

[31]  Wei-Ying Ma,et al.  IGroup: a web image search engine with semantic clustering of search results , 2006, MM '06.

[32]  Sourav S. Bhowmick,et al.  Quantifying tag representativeness of visual content of social images , 2010, ACM Multimedia.

[33]  Charu C. Aggarwal,et al.  Text Mining in Social Networks , 2011, Social Network Data Analytics.

[34]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[35]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[36]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[37]  Gang Wang,et al.  Object image retrieval by exploiting online knowledge resources , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[39]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[40]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[41]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  David A. Shamma,et al.  Watch what I watch: using community activity to understand content , 2007, MIR '07.

[43]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Clustering via the SocialWeb , 2009, ACL.

[44]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[45]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[46]  Xian-Sheng Hua,et al.  Content-aware Ranking for visual search , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[48]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[49]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[50]  Rong Yan,et al.  Co-retrieval: A Boosted Reranking Approach for Video Retrieval , 2004, CIVR.

[51]  Xian-Sheng Hua,et al.  Bayesian video search reranking , 2008, ACM Multimedia.

[52]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[54]  Tao Mei,et al.  CrowdReranking: exploring multiple search engines for visual search reranking , 2009, SIGIR.

[55]  De Xu,et al.  Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking , 2010, CIVR '10.

[56]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[57]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Hai Jin,et al.  Nonparametric Label-to-Region by search , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[60]  Meng Wang,et al.  MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.

[61]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[62]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[64]  Wei-Ying Ma,et al.  A probabilistic semantic model for image annotation and multi-modal image retrieval , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[65]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[66]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.