Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks

Visual saliency patterns are the result of a variety of factors aside from the image being parsed, however existing approaches have ignored these. To address this limitation, we propose a novel saliency estimation model which leverages the semantic modelling power of conditional generative adversarial networks together with memory architectures which capture the subject's behavioural patterns and task dependent factors. We make contributions aiming to bridge the gap between bottom-up feature learning capabilities in modern deep learning architectures and traditional top-down handcrafted features based methods for task specific saliency modelling. The conditional nature of the proposed framework enables us to learn contextual semantics and relationships among di.erent tasks together, instead of learning them separately for each task. Our studies not only shed light on a novel application area for generative adversarial networks, but also emphasise the importance of task specific saliency modelling and demonstrate the plausibility of fully capturing this context via an augmented memory architecture.

[1]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[2]  Fatih Murat Porikli,et al.  Covariance Tracking using Model Update Based on Lie Algebra , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[5]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[6]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[10]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[11]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[14]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[15]  Sridha Sridharan,et al.  Going Deeper: Autonomous Steering with Neural Memory Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[16]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[17]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[18]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[22]  C. Tallon-Baudry,et al.  Unconscious associative memory affects visual processing before 100 ms. , 2008, Journal of vision.

[23]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[24]  Liquan Shen,et al.  Saliency detection using regional histograms. , 2013, Optics letters.

[25]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[26]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[27]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[28]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[29]  Cordelia Schmid,et al.  Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[31]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[32]  Tianming Liu,et al.  Predicting eye fixations using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[34]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[35]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[36]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[39]  Hubert Konik,et al.  A Spatiotemporal Saliency Model for Video Surveillance , 2011, Cognitive Computation.

[40]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  J. Henderson,et al.  Initial scene representations facilitate eye movement guidance in visual search. , 2007, Journal of experimental psychology. Human perception and performance.

[44]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[45]  Sabine Süsstrunk,et al.  Saliency detection for content-aware image resizing , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[46]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[47]  Harish Katti,et al.  Depth Matters: Influence of Depth Cues on Visual Saliency , 2012, ECCV.

[48]  Cristian Sminchisescu,et al.  Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths , 2013, NIPS.

[49]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Namil Kim,et al.  Pixel-Level Domain Transfer , 2016, ECCV.

[53]  Asli Çelikyilmaz,et al.  Associative Adversarial Networks , 2016, ArXiv.

[54]  Sridha Sridharan,et al.  Tree Memory Networks for Modelling Long-term Temporal Dependencies , 2017, Neurocomputing.

[55]  Tamara L. Berg,et al.  Learning Temporal Transformations from Time-Lapse Videos , 2016, ECCV.

[56]  G. Zelinsky A theory of eye movements during target acquisition. , 2008, Psychological review.

[57]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[58]  S. Yantis,et al.  Visual Attention: Bottom-Up Versus Top-Down , 2004, Current Biology.

[59]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .