Visual Memorability for Robotic Interestingness via Unsupervised Online Learning

In this paper, we explore the problem of interesting scene prediction for mobile robots. This area is currently underexplored but is crucial for many practical applications such as autonomous exploration and decision making. Inspired by industrial demands, we first propose a novel translation-invariant visual memory for recalling and identifying interesting scenes, then design a three-stage architecture of long-term, short-term, and online learning. This enables our system to learn human-like experience, environmental knowledge, and online adaption, respectively. Our approach achieves much higher accuracy than the state-of-the-art algorithms on challenging robotic interestingness datasets.

[1]  Wolfram Burgard,et al.  Speeding-Up Robot Exploration by Exploiting Background Information , 2016, IEEE Robotics and Automation Letters.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Chen Shen,et al.  Spatio-Temporal AutoEncoder for Video Anomaly Detection , 2017, ACM Multimedia.

[5]  Chen Wang,et al.  Kernel Cross-Correlator , 2017, AAAI.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[8]  Chen Wang Kernel learning for visual perception , 2019 .

[9]  Jonghyun Choi,et al.  Learning Temporal Regularity in Video Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[11]  Minjung Kim,et al.  Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks , 2018, ICLR.

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Huchuan Lu,et al.  Learning Uncertain Convolutional Features for Accurate Saliency Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Mohammad Soleymani,et al.  Analyzing and Predicting GIF Interestingness , 2016, ACM Multimedia.

[17]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[18]  Sebastian Scherer,et al.  Improved Generalization of Heading Direction Estimation for Aerial Filming Using Semi-Supervised Regression , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Xiangyang Xue,et al.  Understanding and Predicting Interestingness of Videos , 2013, AAAI.

[20]  Josep Lluís de la Rosa i Esteva,et al.  Review of Methods to Predict Social Image Interestingness and Memorability , 2015, CAIP.

[21]  Luc Van Gool,et al.  Visual interestingness in image sequences , 2013, MM '13.

[22]  Martial Hebert,et al.  Detecting Interesting Events Using Unsupervised Density Ratio Estimation , 2012, ECCV Workshops.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Tao Xiang,et al.  Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[26]  Claire-Hélène Demarty,et al.  Deep learning for multimodal-based video interestingness prediction , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Tao Xiang,et al.  Interestingness Prediction by Robust Learning to Rank , 2014, ECCV.

[28]  Simone Calderara,et al.  Latent Space Autoregression for Novelty Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Mats Sjöberg,et al.  Predicting Interestingness of Visual Content , 2017, Visual Content Indexing and Retrieval with Psycho-Visual Models.

[30]  Bogdan Ionescu,et al.  Computational Understanding of Visual Interestingness Beyond Semantics , 2019, ACM Comput. Surv..

[31]  M. Potter,et al.  Recognition memory for a rapid sequence of pictures. , 1969, Journal of experimental psychology.

[32]  Shenghua Gao,et al.  A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Shuai Wang,et al.  Video Interestingness Prediction Based on Ranking Model , 2018 .

[34]  Chokri Ben Amar,et al.  Deep Saliency: Prediction of Interestingness in Video with CNN , 2017, Visual Content Indexing and Retrieval with Psycho-Visual Models.

[35]  W. A. Phillips On the distinction between sensory storage and short-term visual memory , 1974 .

[36]  Chen Wang,et al.  Kervolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Mats Sjöberg,et al.  MediaEval 2017 Predicting Media Interestingness Task , 2016, MediaEval.

[38]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[41]  Svetha Venkatesh,et al.  Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.