Anticipating Next Goal for Robot Plan Prediction

Goal reasoning is a main objective for robot task execution. Here we propose a deep model for learning to infer a next goal, while performing an activity. Because predicting the next goal state requires a robot language, not comparable to sentences, we introduce a specific metric for optimization, which is related to the representation the robot has of the scene. Experiments of the proposed idea and method have been done at a warehouse with a humanoid robot performing tasks assisting a maintenance technician working at a production line.

[1]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[2]  Vikas Shivashankar Hierarchical Goal Networks: Formalisms and Algorithms for Planning and Acting , 2015 .

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[5]  David E. Wilkins,et al.  Recovering from execution errors in SIPE , 1985, Comput. Intell..

[6]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[11]  James A. Hendler,et al.  UMCP: A Sound and Complete Procedure for Hierarchical Task-network Planning , 1994, AIPS.

[12]  Byron Boots,et al.  Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[13]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[14]  David Atkinson,et al.  Generating Perception Requests and Expectations to Verify the Execution of Plans , 1986, AAAI.

[15]  Fiora Pirri,et al.  Help by Predicting What to Do , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[16]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[17]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  I. Ajzen The theory of planned behavior , 1991 .

[20]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[21]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Fiora Pirri,et al.  Deep Execution Monitor for Robot Assistive Tasks , 2018, ECCV Workshops.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Honglak Lee,et al.  Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NIPS 2018.

[30]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[31]  J. A. Anderson,et al.  Talking Nets: An Oral History Of Neural Networks , 1998, IEEE Trans. Neural Networks.

[32]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33]  Gerhard Lakemeyer,et al.  Initial Results on Generating Macro Actions from a Plan Database for Planning on Autonomous Mobile Robots , 2017, ICAPS.

[34]  David W. Aha,et al.  Hierarchical Planning: Relating Task and Goal Decomposition with Task Sharing , 2016, IJCAI.

[35]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[36]  Yoshua Bengio,et al.  End-to-End Online Writer Identification With Recurrent Neural Network , 2017, IEEE Transactions on Human-Machine Systems.

[37]  Jason Weston,et al.  Learning semantic representations of objects and their parts , 2014, Machine Learning.

[38]  Byron Boots,et al.  Predictive State Recurrent Neural Networks , 2017, NIPS.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Manuela M. Veloso,et al.  Plan execution monitoring through detection of unmet expectations about action outcomes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[43]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[44]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[45]  Fiora Pirri,et al.  Visual Search and Recognition for Robot Task Execution and Monitoring , 2019, APPIS.

[46]  Stefan Lee,et al.  Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[48]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yunde Jia,et al.  Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[50]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[51]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[55]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Tetsuya Ogata,et al.  Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions , 2017, Front. Neurorobot..

[58]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[59]  Fahiem Bacchus,et al.  PKS: Knowledge-Based Planning with Incomplete Information and Sensing , 2004 .

[60]  Dale Schuurmans,et al.  Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[61]  Yi Yang,et al.  Uncovering the Temporal Context for Video Question Answering , 2017, International Journal of Computer Vision.

[62]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[63]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[65]  Pascal Vincent,et al.  Unsupervised Learning of Semantics of Object Detections for Scene Categorization , 2013, ICPRAM.

[66]  Maren Bennewitz,et al.  Mobile manipulation in cluttered environments with humanoids: Integrated perception, task planning, and action execution , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[67]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[68]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[69]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[70]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).