A saliency-based approach to event recognition

Abstract Over the last few years, a number of interesting solutions covering different aspects of event recognition have been proposed for event-based multimedia analysis. Existing approaches mostly focus on an efficient representation of the image and advanced classification schemes. However, it would be desirable to focus on the event-specific information available in the image, namely the so-called event saliency. In this paper, we propose a novel approach based on multiple instance learning (MIL) to learn the visual features contained in event-salient regions, extracted through a crowd-sourcing study. In total, we collect the salient regions for 76 different events from 4 large-scale datasets. The experimental results demonstrate the efficacy of using only event-related regions by achieving a significant gain in performance over the state-of-the-art.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Chengdong Wu,et al.  Visual saliency detection: From space to frequency , 2016, Signal Process. Image Commun..

[4]  Francesco G. B. De Natale,et al.  Discovering inherent event taxonomies from social media collections , 2012, ICMR.

[5]  Benoit Huet,et al.  Heterogeneous features and model selection for event-based media classification , 2013, ICMR.

[6]  Yiannis Kompatsiaris,et al.  Cluster-Based Landmark and Event Detection for Tagged Photo Collections , 2011, IEEE MultiMedia.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Xiangmin Xu,et al.  A multi-scene deep learning model for image aesthetic evaluation , 2016, Signal Process. Image Commun..

[9]  Sergio Escalera,et al.  ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[10]  Yiannis Kompatsiaris,et al.  CERTH @ MediaEval 2013 Social Event Detection Task , 2013, MediaEval.

[11]  Nicu Sebe,et al.  Event-based media processing and analysis: A survey of the literature , 2016, Image Vis. Comput..

[12]  Nasir Ahmad,et al.  Saliency based skin detection in complex scenes , 2013, Other Conferences.

[13]  Francesco G. B. De Natale,et al.  A hierarchical approach to event discovery from single images using MIL framework , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[14]  Jeff Z. Pan,et al.  Multimedia annotations on the semantic Web , 2006, IEEE Multimedia.

[15]  Martha Larson,et al.  Crowdsourcing as self-fulfilling prophecy: Influence of discarding workers in subjective assessment tasks , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[16]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17]  Vijay Kumar Sharma,et al.  MIL based visual object tracking with kernel and scale adaptation , 2017, Signal Process. Image Commun..

[18]  Amaia Salvador,et al.  Cultural Event recognition with visual ConvNets and temporal models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Zhe Wang,et al.  Better Exploiting OS-CNNs for Better Event Recognition in Images , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[20]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[21]  Dahua Lin,et al.  Recognize complex events from static images by fusing deep channels , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Francesco G. B. De Natale,et al.  Automatic Synchronization of Multi-user Photo Galleries , 2017, IEEE Transactions on Multimedia.

[24]  Wei Liu,et al.  Multimedia classification and event detection using double fusion , 2013, Multimedia Tools and Applications.

[25]  Yiannis Kompatsiaris,et al.  High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[30]  Stefano Tubaro,et al.  Deep Convolutional Neural Networks for pedestrian detection , 2015, Signal Process. Image Commun..

[31]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Francesco G. B. De Natale,et al.  Robust event discovery from photo collections using Signature Image Bases (SIBs) , 2012, Multimedia Tools and Applications.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Matthieu Guillaumin,et al.  Event Recognition in Photo Collections with a Stopwatch HMM , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Xin Liu,et al.  Exploiting Feature Hierarchies with Convolutional Neural Networks for Cultural Event Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[36]  Yiannis Kompatsiaris,et al.  Social Event Detection at MediaEval 2012: Challenges, Dataset and Evaluation , 2012, MediaEval.

[37]  Michael Riegler,et al.  JORD: A System for Collecting Information and Monitoring Natural Disasters by Linking Social Media with Satellite Imagery , 2017, CBMI.

[38]  Francesco G. B. De Natale,et al.  USED: a large-scale social event detection dataset , 2016, MMSys.

[39]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.

[40]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[41]  Xinmei Tian,et al.  Event recognition in personal photo collections using hierarchical model and multiple features , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[42]  Yiannis Kompatsiaris,et al.  High-level event detection system based on discriminant visual concepts , 2011, ICMR '11.

[43]  Petros Maragos,et al.  A perceptually based spatio-temporal computational framework for visual saliency estimation , 2015, Signal Process. Image Commun..

[44]  Ebroul Izquierdo,et al.  Social event detection and retrieval in collaborative photo collections , 2012, ICMR '12.

[45]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[46]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Yu Qiao,et al.  Object-Scene Convolutional Neural Networks for event recognition in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Francesco G. B. De Natale,et al.  EventMask: A Game-Based Framework for Event-Saliency Identification in Images , 2015, IEEE Transactions on Multimedia.

[49]  Nojun Kwak,et al.  Cultural event recognition by subregion classification with convolutional neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.