In-video product annotation with web information mining

Product annotation in videos is of great importance for video browsing, search, and advertisement. However, most of the existing automatic video annotation research focuses on the annotation of high-level concepts, such as events, scenes, and object categories. This article presents a novel solution to the annotation of specific products in videos by mining information from the Web. It collects a set of high-quality training data for each product by simultaneously leveraging Amazon and Google image search engine. A visual signature for each product is then built based on the bag-of-visual-words representation of the training images. A correlative sparsification approach is employed to remove noisy bins in the visual signatures. These signatures are used to annotate video frames. We conduct experiments on more than 1,000 videos and the results demonstrate the feasibility and effectiveness of our approach.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Sheng Tang,et al.  Logo detection based on spatial-spectral saliency and partial spatial context , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[7]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[8]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[9]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[10]  Cees Snoek,et al.  Can social tagged images aid concept-based video search? , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[11]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[12]  Tao Mei,et al.  AdOn: an intelligent overlay video advertising system , 2009, SIGIR.

[13]  Sheng Tang,et al.  TRECVID 2008 Participation by MCG-ICT-CAS , 2008, TRECVID.

[14]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[16]  Xian-Sheng Hua,et al.  Towards a Relevant and Diverse Search of Social Images , 2010, IEEE Transactions on Multimedia.

[17]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[19]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jing Liu,et al.  Multi-modal multi-correlation person-centric news retrieval , 2010, CIKM.

[21]  Xian-Sheng Hua,et al.  MSRA-MM: Bridging Research and Industrial Societies for Multimedia Information Retrieval , 2009 .

[22]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[23]  Xing Xie,et al.  Spatial pyramid mining for logo detection in natural scenes , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Rainer Lienhart,et al.  Scalable logo recognition in real-world images , 2011, ICMR.

[25]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[26]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[28]  Xian-Sheng Hua,et al.  Collaborative learning for image and video annotation , 2008, MIR '08.

[29]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[30]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[31]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[32]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Shih-Fu Chang,et al.  Revision of LSCOM Event/Activity Annotations , 2006 .

[34]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[35]  Tao Mei,et al.  Contextual Internet Multimedia Advertising , 2010, Proceedings of the IEEE.

[36]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Hua Li,et al.  Mobile Search With Multimodal Queries , 2008, Proceedings of the IEEE.

[40]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[41]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.