Visual Search and Recognition for Robot Task Execution and Monitoring

Visual search of relevant targets in the environment is a crucial robot skill. We propose a preliminary framework for the execution monitor of a robot task, taking care of the robot attitude to visually searching the environment for targets involved in the task. Visual search is also relevant to recover from a failure. The framework exploits deep reinforcement learning to acquire a "common sense" scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them. The framework builds on these methods to introduce a vision-based execution monitoring, which uses classical planning as a backbone for task execution. Experiments show that with the proposed vision-based execution monitor the robot can complete simple tasks and can recover from failures in autonomy.

[1]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Masahiro Tomono,et al.  3-D Object Map Building Using Dense Object Models with SIFT-based Recognition Features , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  David Atkinson,et al.  Generating Perception Requests and Expectations to Verify the Execution of Plans , 1986, AAAI.

[6]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.

[7]  Natàlia Hurtós,et al.  ROSPlan: Planning in the Robot Operating System , 2015, ICAPS.

[8]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Yoram Koren,et al.  Real-time obstacle avoidance for fast mobile robots in cluttered environments , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[10]  Richard Fikes,et al.  Monitored Execution of Robot Plans Producted by STRIPS , 1971, IFIP Congress.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  David E. Wilkins,et al.  Recovering from execution errors in SIPE , 1985, Comput. Intell..

[13]  Manuela M. Veloso,et al.  Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[14]  Yoshiaki Shirai,et al.  Autonomous visual navigation of a mobile robot using a human-guided experience , 2002, Robotics Auton. Syst..

[15]  Fiora Pirri,et al.  A3D: A Device for Studying Gaze in 3D , 2016, ECCV Workshops.

[16]  Parvaneh Saeedi,et al.  Vision-based 3-D trajectory tracking for unknown environments , 2006, IEEE Transactions on Robotics.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Manuela M. Veloso,et al.  Plan execution monitoring through detection of unmet expectations about action outcomes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  G. Oriolo,et al.  On-line map building and navigation for autonomous mobile robots , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[21]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[22]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[23]  David Wooden,et al.  A guide to vision-based map building , 2006, IEEE Robotics & Automation Magazine.

[24]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[25]  Chris Burbridge,et al.  Bootstrapping Probabilistic Models of Qualitative Spatial Relations for Active Visual Object Search , 2014, AAAI Spring Symposia.

[26]  Malte Helmert,et al.  Concise finite-domain representations for PDDL planning tasks , 2009, Artif. Intell..

[27]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[28]  Patrick Gros,et al.  Robot motion control from a visual memory , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[29]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Craig Boutilier,et al.  Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[31]  Nils J. Nilsson,et al.  A Hierarchical Robot Planning and Execution System. , 1973 .

[32]  Raymond Reiter,et al.  Some contributions to the metatheory of the situation calculus , 1999, JACM.

[33]  Fiora Pirri,et al.  Saliency prediction in the coherence theory of attention , 2013, BICA 2013.

[34]  Ramakant Nevatia,et al.  Symbolic Navigation with a Generic Map , 1999, Auton. Robots.

[35]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ola Pettersson,et al.  Execution monitoring in robotics: A survey , 2005, Robotics Auton. Syst..

[38]  Ross T. Whitaker,et al.  A Level-Set Approach to 3D Reconstruction from Range Data , 1998, International Journal of Computer Vision.

[39]  Maren Bennewitz,et al.  Mobile manipulation in cluttered environments with humanoids: Integrated perception, task planning, and action execution , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[40]  Anthony G. Cohn,et al.  Representing and Reasoning with Qualitative Spatial Relations About Regions , 1997 .

[41]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  James J. Little,et al.  Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Simon Lacroix,et al.  Reactive navigation in outdoor environments using potential fields , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[45]  Michel Dhome,et al.  Outdoor autonomous navigation using monocular vision , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Václav Hlavác,et al.  Deployment of ground and aerial robots in earthquake-struck Amatrice in Italy (brief report) , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[47]  Malik Ghallab,et al.  Deliberation for autonomous robots: A survey , 2017, Artif. Intell..

[48]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.