论文信息 - Visual Search and Recognition for Robot Task Execution and Monitoring

Visual Search and Recognition for Robot Task Execution and Monitoring

Visual search of relevant targets in the environment is a crucial robot skill. We propose a preliminary framework for the execution monitor of a robot task, taking care of the robot attitude to visually searching the environment for targets involved in the task. Visual search is also relevant to recover from a failure. The framework exploits deep reinforcement learning to acquire a "common sense" scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them. The framework builds on these methods to introduce a vision-based execution monitoring, which uses classical planning as a backbone for task execution. Experiments show that with the proposed vision-based execution monitor the robot can complete simple tasks and can recover from failures in autonomy.

[1] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Masahiro Tomono,et al. 3-D Object Map Building Using Dense Object Models with SIFT-based Recognition Features , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] David Atkinson,et al. Generating Perception Requests and Expectations to Verify the Execution of Plans , 1986, AAAI.

[6] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.

[7] Natàlia Hurtós,et al. ROSPlan: Planning in the Robot Operating System , 2015, ICAPS.

[8] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Yoram Koren,et al. Real-time obstacle avoidance for fast mobile robots in cluttered environments , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[10] Richard Fikes,et al. Monitored Execution of Robot Plans Producted by STRIPS , 1971, IFIP Congress.

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] David E. Wilkins,et al. Recovering from execution errors in SIPE , 1985, Comput. Intell..

[13] Manuela M. Veloso,et al. Visual sonar: fast obstacle avoidance using monocular vision , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[14] Yoshiaki Shirai,et al. Autonomous visual navigation of a mobile robot using a human-guided experience , 2002, Robotics Auton. Syst..

[15] Fiora Pirri,et al. A3D: A Device for Studying Gaze in 3D , 2016, ECCV Workshops.

[16] Parvaneh Saeedi,et al. Vision-based 3-D trajectory tracking for unknown environments , 2006, IEEE Transactions on Robotics.

[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18] Manuela M. Veloso,et al. Plan execution monitoring through detection of unmet expectations about action outcomes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20] G. Oriolo,et al. On-line map building and navigation for autonomous mobile robots , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[21] Craig A. Knoblock,et al. PDDL-the planning domain definition language , 1998 .

[22] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[23] David Wooden,et al. A guide to vision-based map building , 2006, IEEE Robotics & Automation Magazine.

[24] Yoram Koren,et al. The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[25] Chris Burbridge,et al. Bootstrapping Probabilistic Models of Qualitative Spatial Relations for Active Visual Object Search , 2014, AAAI Spring Symposia.

[26] Malte Helmert,et al. Concise finite-domain representations for PDDL planning tasks , 2009, Artif. Intell..

[27] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[28] Patrick Gros,et al. Robot motion control from a visual memory , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[29] Dan Klein,et al. Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30] Craig Boutilier,et al. Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[31] Nils J. Nilsson,et al. A Hierarchical Robot Planning and Execution System. , 1973 .

[32] Raymond Reiter,et al. Some contributions to the metatheory of the situation calculus , 1999, JACM.

[33] Fiora Pirri,et al. Saliency prediction in the coherence theory of attention , 2013, BICA 2013.

[34] Ramakant Nevatia,et al. Symbolic Navigation with a Generic Map , 1999, Auton. Robots.

[35] Malte Helmert,et al. The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[36] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Ola Pettersson,et al. Execution monitoring in robotics: A survey , 2005, Robotics Auton. Syst..

[38] Ross T. Whitaker,et al. A Level-Set Approach to 3D Reconstruction from Range Data , 1998, International Journal of Computer Vision.

[39] Maren Bennewitz,et al. Mobile manipulation in cluttered environments with humanoids: Integrated perception, task planning, and action execution , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[40] Anthony G. Cohn,et al. Representing and Reasoning with Qualitative Spatial Relations About Regions , 1997 .

[41] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[42] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43] James J. Little,et al. Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44] Simon Lacroix,et al. Reactive navigation in outdoor environments using potential fields , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[45] Michel Dhome,et al. Outdoor autonomous navigation using monocular vision , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46] Václav Hlavác,et al. Deployment of ground and aerial robots in earthquake-struck Amatrice in Italy (brief report) , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[47] Malik Ghallab,et al. Deliberation for autonomous robots: A survey , 2017, Artif. Intell..

[48] Ali Farhadi,et al. Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Andrew J. Davison,et al. Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.