Learning When and Where to Zoom With Deep Reinforcement Learning

While high resolution images contain semantically more useful information than their lower resolution counterparts, processing them is computationally more expensive, and in some applications, e.g. remote sensing, they can be much more expensive to acquire. For these reasons, it is desirable to develop an automatic method to selectively use high resolution data when necessary while maintaining accuracy and reducing acquisition/run-time cost. In this direction, we propose PatchDrop a reinforcement learning approach to dynamically identify when and where to use/acquire high resolution data conditioned on the paired, cheap, low resolution images. We conduct experiments on CIFAR10, CIFAR100, ImageNet and fMoW datasets where we use significantly less high resolution data while maintaining similar accuracy to models which use full high resolution images.

[1]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[3]  Matthew J. Hoffman,et al.  Integrating Hyperspectral Likelihoods in a Multidimensional Assignment Algorithm for Aerial Vehicle Tracking , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[4]  Jong Chul Ye,et al.  A deep convolutional neural network using directional wavelets for low‐dose X‐ray CT reconstruction , 2016, Medical physics.

[5]  Stefano Ermon,et al.  Efficient Object Detection in Large Images Using Deep Reinforcement Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Matthew J. Hoffman,et al.  Aerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gordon Christie,et al.  Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Matthew J. Hoffman,et al.  Real-Time Vehicle Tracking in Aerial Video Using Hyperspectral Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Jie Li,et al.  Image super-resolution: The techniques, applications, and future , 2016, Signal Process..

[16]  Feng Liu,et al.  Low-resolution image categorization via heterogeneous domain adaptation , 2019, Knowl. Based Syst..

[17]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[18]  Cristian Sminchisescu,et al.  Deep Reinforcement Learning of Region Proposal Networks for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Thomas S. Huang,et al.  Studying Very Low Resolution Recognition Using Deep Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Naoto Yokoya,et al.  IMG2DSM: Height Simulation From Single Imagery Using Conditional Generative Adversarial Net , 2018, IEEE Geoscience and Remote Sensing Letters.

[21]  Tao Xiang,et al.  Multi-Scale Learning for Low-Resolution Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Shaogang Gong,et al.  Deep Low-Resolution Person Re-Identification , 2018, AAAI.

[23]  Jean Ponce,et al.  Learning a convolutional neural network for non-uniform motion blur removal , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Clement Atzberger,et al.  Using Low Resolution Satellite Imagery for Yield Prediction and Yield Anomaly Detection , 2013, Remote. Sens..

[25]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[26]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[27]  Tara Javidi,et al.  Adaptive Object Detection Using Adjacency and Zoom Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Jonathan R. B. Fisher,et al.  Impact of satellite imagery spatial resolution on land use classification accuracy and modeled water quality , 2018 .

[31]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[32]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[33]  Stefano Ermon,et al.  Learning to Interpret Satellite Images in Global Scale Using Wikipedia , 2019, ArXiv.

[34]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Stefano Ermon,et al.  Learning to Interpret Satellite Images Using Wikipedia , 2018, IJCAI.

[37]  Larry S. Davis,et al.  BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Stefano Ermon,et al.  Predicting Economic Development using Geolocated Wikipedia Articles , 2019, KDD.

[40]  Subhransu Maji,et al.  Adapting Models to Signal Degradation using Distillation , 2017, BMVC.

[41]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[43]  Matthew J. Hoffman,et al.  Tracking in Aerial Hyperspectral Videos Using Deep Kernelized Correlation Filters , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Xin Li,et al.  FoveaNet: Perspective-Aware Urban Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Larry S. Davis,et al.  Dynamic Zoom-in Network for Fast Object Detection in Large Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[47]  Yan Wang,et al.  Resource Aware Person Re-identification Across Multiple Resolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[49]  Malcolm Davidson,et al.  Sentinel-1 System capabilities and applications , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[50]  Stefano Ermon,et al.  Cloud Removal from Satellite Images using Spatiotemporal Generator Networks , 2020 .

[51]  Bin Chen,et al.  Feature Matching With an Adaptive Optical Sensor in a Ground Target Tracking System , 2015, IEEE Sensors Journal.

[52]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[53]  Errui Ding,et al.  Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition , 2018, ECCV.

[54]  Patrick J. Flynn,et al.  On Low-Resolution Face Recognition in the Wild: Comparisons and New Techniques , 2018, IEEE Transactions on Information Forensics and Security.

[55]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Kate Saenko,et al.  Fine-to-coarse knowledge transfer for low-res image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[57]  Lipo Wang,et al.  Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.

[58]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[59]  K. Malarvizhi,et al.  Use of High Resolution Google Earth Satellite Imagery in Landuse Map Preparation for Urban Related Applications , 2016 .