Retina enhanced SURF descriptors for spatio-temporal concept detection

This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.

[1]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[2]  Ajay Divakaran Multimedia Content Analysis: Theory and Applications , 2008 .

[3]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[4]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[5]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[6]  Erik Reinhard,et al.  Ieee Transactions on Visualization and Computer Graphics 1 Dynamic Range Reduction Inspired by Photoreceptor Physiology , 2022 .

[7]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  S. Govindarajulu,et al.  A Comparison of SIFT, PCA-SIFT and SURF , 2012 .

[9]  Pierre Kornprobst,et al.  Bio-inspired Bags-of-features for Image Classification , 2011, KDIR.

[10]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[11]  Bernard Mérialdo,et al.  Saliency moments for image categorization , 2011, ICMR.

[12]  T H MEIKLE,et al.  THE ROLE OF THE SUPERIOR COLLICULUS IN VISUALLY GUIDED BEHAVIOR. , 1965, Experimental neurology.

[13]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Hans-Peter Seidel,et al.  Predicting visible differences in high dynamic range images: model and its calibration , 2005, IS&T/SPIE Electronic Imaging.

[15]  Jeanny Herault Vision: Images, Signals and Neural Networks - Models of Neural Processing in Visual Perception , 2010 .

[16]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[17]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[18]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[19]  Nicolas Ballas,et al.  Trajectories based descriptor for dynamic events annotation , 2011, J-MRE '11.

[20]  Miriam Redi,et al.  EURECOM at TrecVid 2011: The Light Semantic Indexing Task , 2011, TRECVID.

[21]  Alice Caplier,et al.  Using Human Visual System modeling for bio-inspired low level image processing , 2010, Comput. Vis. Image Underst..

[22]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[25]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[26]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Luo Juan,et al.  A comparison of SIFT, PCA-SIFT and SURF , 2009 .

[28]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[29]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.