Active Vision in the Era of Convolutional Neural Networks

In this work, we examine the literature of active object recognition in the past and present. We note that methods in the past used a notion of recognition ambiguity in order to find a next best view policy that could disambiguate the object with the fewest camera moves. Present methods on the other hand use deep reinforcement learning to learn camera control policies from the data. We show on a public dataset, that reinforcement learning methods are not superior to a policy of adequately sampling the object view-sphere. Instead of focusing on finding the next best view, we examine a recent method of quantifying recognition uncertainty in deep learning as a potential application to active object recognition. We find that predictions with this technique are well calibrated with respect to the performance of a network on a test-set, showing that it could be useful in an active vision scenario.

[1]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[2]  Sven J. Dickinson,et al.  A Computational Model of View Degeneracy , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Bernt Schiele,et al.  Transinformation for active object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[5]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[6]  John K. Tsotsos,et al.  Active object recognition , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  R. Bajcsy Active perception , 1988 .

[8]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[10]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[11]  Javier R. Movellan,et al.  Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning , 2015, BMVC.

[12]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[13]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[14]  Sven J. Dickinson,et al.  Active Object Recognition Integrating Attention and Viewpoint Control , 1994, Comput. Vis. Image Underst..

[15]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[16]  Fuchun Sun,et al.  Extreme Trust Region Policy Optimization for Active Object Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[18]  Frank P. Ferrie,et al.  From Uncertainty to Visual Exploration , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[21]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[22]  Kristen Grauman,et al.  Look-Ahead Before You Leap: End-to-End Active Recognition by Forecasting the Effect of Motion , 2016, ECCV.

[23]  Garrison W. Cottrell,et al.  Deep active object recognition by joint label and action prediction , 2017, Comput. Vis. Image Underst..

[24]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[30]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[31]  Jana Kosecka,et al.  A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[34]  Frank P. Ferrie,et al.  Active recognition: using uncertainty to reduce ambiguity , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[35]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[36]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.