论文信息 - Anticipating Next Goal for Robot Plan Prediction

Anticipating Next Goal for Robot Plan Prediction

Goal reasoning is a main objective for robot task execution. Here we propose a deep model for learning to infer a next goal, while performing an activity. Because predicting the next goal state requires a robot language, not comparable to sentences, we introduce a specific metric for optimization, which is related to the representation the robot has of the scene. Experiments of the proposed idea and method have been done at a warehouse with a humanoid robot performing tasks assisting a maintenance technician working at a production line.

[1] Ronald C. Arkin,et al. An Behavior-based Robotics , 1998 .

[2] Vikas Shivashankar. Hierarchical Goal Networks: Formalisms and Algorithms for Planning and Acting , 2015 .

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[5] David E. Wilkins,et al. Recovering from execution errors in SIPE , 1985, Comput. Intell..

[6] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Malte Helmert,et al. The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[11] James A. Hendler,et al. UMCP: A Sound and Complete Procedure for Hierarchical Task-network Planning , 1994, AIPS.

[12] Byron Boots,et al. Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[13] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[14] David Atkinson,et al. Generating Perception Requests and Expectations to Verify the Execution of Plans , 1986, AAAI.

[15] Fiora Pirri,et al. Help by Predicting What to Do , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[16] Christos Faloutsos,et al. GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[17] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[19] I. Ajzen. The theory of planned behavior , 1991 .

[20] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[21] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Fiora Pirri,et al. Deep Execution Monitor for Robot Assistive Tasks , 2018, ECCV Workshops.

[25] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[28] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29] Honglak Lee,et al. Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NIPS 2018.

[30] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[31] J. A. Anderson,et al. Talking Nets: An Oral History Of Neural Networks , 1998, IEEE Trans. Neural Networks.

[32] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33] Gerhard Lakemeyer,et al. Initial Results on Generating Macro Actions from a Plan Database for Planning on Autonomous Mobile Robots , 2017, ICAPS.

[34] David W. Aha,et al. Hierarchical Planning: Relating Task and Goal Decomposition with Task Sharing , 2016, IJCAI.

[35] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[36] Yoshua Bengio,et al. End-to-End Online Writer Identification With Recurrent Neural Network , 2017, IEEE Transactions on Human-Machine Systems.

[37] Jason Weston,et al. Learning semantic representations of objects and their parts , 2014, Machine Learning.

[38] Byron Boots,et al. Predictive State Recurrent Neural Networks , 2017, NIPS.

[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[40] Manuela M. Veloso,et al. Plan execution monitoring through detection of unmet expectations about action outcomes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41] Philip H. S. Torr,et al. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[43] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[44] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[45] Fiora Pirri,et al. Visual Search and Recognition for Robot Task Execution and Monitoring , 2019, APPIS.

[46] Stefan Lee,et al. Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] D. McFadden. Conditional logit analysis of qualitative choice behavior , 1972 .

[48] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Yunde Jia,et al. Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[50] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[51] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[52] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[57] Tetsuya Ogata,et al. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions , 2017, Front. Neurorobot..

[58] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[59] Fahiem Bacchus,et al. PKS: Knowledge-Based Planning with Incomplete Information and Sensing , 2004 .

[60] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[61] Yi Yang,et al. Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.

[62] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[63] Dan Klein,et al. Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[65] Pascal Vincent,et al. Unsupervised Learning of Semantics of Object Detections for Scene Categorization , 2013, ICPRAM.

[66] Maren Bennewitz,et al. Mobile manipulation in cluttered environments with humanoids: Integrated perception, task planning, and action execution , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[67] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[68] Jiasen Lu,et al. VQA: Visual Question Answering , 2015, ICCV.

[69] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[70] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).