Representations for Visually Guided Actions

Author(s): Gupta, Saurabh | Advisor(s): Malik, Jitendra | Abstract: In recent times, computer vision has made great leaps towards 2D understanding of sparse visual snapshots of the world. This is insufficient for robots that need to exist and act in the 3D world around them based on a continuous stream of multi-modal inputs. In this work, we present efforts towards bridging this gap between computer vision and robotics. We show how thinking about computer vision and robotics together brings out limitations of current computer vision tasks and techniques, and motivates joint study of perception and action. We present some initial efforts towards this and investigate a) how we can move from 2D understanding of images to 3D understanding of the underlying scene, b) how recent advances in representation learning for images can be extended to obtain representations for varied sensing modalities useful in robotics, and c) how thinking about vision and action together can lead to more effective solutions for the classical problem of visual navigation.

[1]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[2]  R. Bellman A Markovian Decision Process , 1957 .

[3]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[4]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[5]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[6]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[7]  O. Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[8]  John Canny,et al.  The complexity of robot motion planning , 1988 .

[9]  Tomaso Poggio,et al.  Cooperative computation of stereo disparity , 1988 .

[10]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[11]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[12]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[13]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[14]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[15]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[16]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[17]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[18]  Richard Szeliski,et al.  Creating full view panoramic image mosaics and environment maps , 1997, SIGGRAPH.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  David W. Murray,et al.  Mobile Robot Localisation Using Active Vision , 1998, ECCV.

[22]  Richard Szeliski,et al.  Applications of Computer Vision to Computer Graphics , 2020 .

[23]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[24]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[25]  Viii Supervisor Sonar-Based Real-World Mapping and Navigation , 2001 .

[26]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[28]  Marc Toussaint,et al.  Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System , 2003, NIPS.

[29]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[30]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[32]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[33]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Michael Gasser,et al.  The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.

[35]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[38]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[39]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[40]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[41]  Urs A. Muller,et al.  Learning long-range vision for autonomous off-road driving , 2009 .

[42]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[45]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[46]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[47]  Trevor Darrell,et al.  Learning to Recognize Objects from Unseen Modalities , 2010, ECCV.

[48]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[49]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[51]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[52]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[53]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[54]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[55]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[56]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[57]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[59]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[60]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[61]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[62]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[64]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[65]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[66]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[67]  Marc Pollefeys,et al.  Vision-based autonomous mapping and exploration using a quadrotor MAV , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[68]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[69]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Ivor W. Tsang,et al.  Learning with Augmented Features for Heterogeneous Domain Adaptation , 2012, ICML.

[71]  James M. Keller,et al.  Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[72]  Derek Hoiem,et al.  Support Surface Prediction in Indoor Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Sanja Fidler,et al.  Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[74]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[75]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[76]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Patric Jensfelt,et al.  Active Visual Object Search in Unknown Environments Using Uncertain Semantics , 2013, IEEE Transactions on Robotics.

[78]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Silvio Savarese,et al.  Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Cristian Sminchisescu,et al.  CPMC-3D-O2P: Semantic segmentation of RGB-D images using CPMC and Second Order Pooling , 2013, ArXiv.

[81]  Joseph J. Lim,et al.  Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[83]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[84]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[86]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[88]  Abhinav Gupta,et al.  Building Part-Based Object Detectors via 3D Geometry , 2013, 2013 IEEE International Conference on Computer Vision.

[89]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[90]  Dieter Fox,et al.  Learning hierarchical sparse features for RGB-(D) object recognition , 2014, Int. J. Robotics Res..

[91]  Sanja Fidler,et al.  What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Martial Hebert,et al.  3DNN: 3D Nearest Neighbor , 2014, International Journal of Computer Vision.

[93]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[94]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Ruiqi Guo,et al.  Scene understanding with complete scenes and structured representations , 2014 .

[96]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[97]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[98]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[99]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[100]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[101]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[103]  Svetlana Lazebnik,et al.  Scene Parsing with Object Instances and Occlusion Ordering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[105]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[106]  Antonio Torralba,et al.  FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[107]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[108]  Nathan Silberman,et al.  Instance Segmentation of Indoor Scenes Using a Coverage Loss , 2014, ECCV.

[109]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[110]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[111]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[113]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[115]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[116]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[117]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[118]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[119]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[120]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[121]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[122]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[123]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[124]  Bharath Hariharan,et al.  Beyond Bounding Boxes: Precise Localization of Objects in Images , 2015 .

[125]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[126]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[127]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[128]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[129]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[130]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[131]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[132]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[133]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[134]  Philip H. S. Torr,et al.  Playing Doom with SLAM-Augmented Deep Reinforcement Learning , 2016, ArXiv.

[135]  Sergey Levine,et al.  Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[136]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[137]  Jitendra Malik,et al.  Generic 3D Representation via Pose Estimation and Matching , 2016, ECCV.

[138]  Fernando Diaz,et al.  Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.

[139]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[140]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[141]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[142]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[143]  Trevor Darrell,et al.  Cross-modal adaptation for RGB-D detection , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[144]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[145]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[146]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[147]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[148]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[149]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[150]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[151]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[152]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[153]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[154]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[155]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[156]  Jana Kosecka,et al.  A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[157]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[158]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[159]  S. Levine,et al.  Cognitive Mapping and Planning for Visual Navigation Supplementary Material , 2017 .

[160]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[161]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[162]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[163]  Sergey Levine,et al.  PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[164]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[165]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[166]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[167]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[168]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[169]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.

[170]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[171]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[172]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[173]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[174]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.