Teach robots understanding new object types and attributes through natural language instructions

Robots often have limited knowledge about the environment and need to continuously acquire new knowledge in order to collaborate with the humans. To address this issue, this paper presents a method which allows the human to teach a robot new object types and attributes through natural language (NL) instructions. A simple yet robust vision algorithm is proposed to segment objects and describe the relations between objects. The segmented objects as well as their relations are regarded as the basic knowledge of the robot. The NL instructions are processed to domain-specific representations for the robot to identify the target objects. The target objects as well as the object type or attribute labels referred in the NL instructions are collected as training samples for the robot to learn. Experimental results demonstrate the effectiveness and advantages of the proposed method.

[1]  Yunyi Jia,et al.  Teaching Robots New Actions through Natural Language Instructions , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Danica Kragic,et al.  Enhanced visual scene understanding through human-robot dialog , 2010, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Yunyi Jia,et al.  Feedback of robot states for object detection in natural language controlled robotic systems , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[5]  Yunyi Jia,et al.  Saliency-Guided Detection of Unknown Objects in RGB-D Indoor Scenes , 2015, Sensors.

[6]  Changsong Liu,et al.  Modeling Collaborative Referring for Situated Referential Grounding , 2013, SIGDIAL Conference.

[7]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[8]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[9]  ColletAlvaro,et al.  The MOPED framework , 2011 .

[10]  Ashutosh Saxena,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[11]  Jacek Malec,et al.  Describing constraint-based assembly tasks in unstructured natural language , 2014 .

[12]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[13]  Mohamed Abderrahim,et al.  3D Object Reconstruction with a Single RGB-Depth Image , 2013, VISAPP.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Matthew R. Walter,et al.  Efficient Natural Language Interfaces for Assistive Robots , 2014, IROS 2014.

[17]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[18]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Pittsburgh,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011 .

[20]  Dieter Fox,et al.  Learning to identify new objects , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.