Recent Advances and Challenges of Semantic Image/Video Search

We present an overview of recent advances and major challenges in image and video search, with a specific focus on large-scale semantic concept detection and indexing. Such semantic indexing paradigm has been driven by the increasing availability of the large resources of corpora, novel labeling approaches, innovative image features, and machine learning techniques for visual content recognition. We will discus key approaches, recent results, and novel applications in text-to-concept semantic search and multi-modal retrieval models. Open issues and major opportunities are also presented.

[1]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[2]  John R. Smith,et al.  Modeling semantic concepts to support query by keywords in video , 2002, Proceedings. International Conference on Image Processing.

[3]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[4]  Marcel Worring,et al.  The Mediamill Semantic Video Search Engine , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[6]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[7]  Mark Sanderson,et al.  The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004 , 2004, CLEF.

[8]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[9]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[10]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[11]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[13]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[14]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Rainer Lienhart,et al.  PLSA on Large Scale Image Databases , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Ara V. Nefian,et al.  Web Image Clustering , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[19]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[20]  Marc Davis,et al.  Media Streams: an iconic visual language for video annotation , 1993, Proceedings 1993 IEEE Symposium on Visual Languages.

[21]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[22]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[23]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).