Dual Track Multimodal Automatic Learning through Human-Robot Interaction

Human beings are constantly improving their cognitive ability via automatic learning from the interaction with the environment. Two important aspects of automatic learning are the visual perception and knowledge acquisition. The fusion of these two aspects is vital for improving the intelligence and interaction performance of robots. Many automatic knowledge extraction and recognition methods have been widely studied. However, little work focuses on integrating automatic knowledge extraction and recognition into a unified framework to enable jointly visual perception and knowledge acquisition. To solve this problem, we propose a Dual Track Multimodal Automatic Learning (DTMAL) system, which consists of two components: Hybrid Incremental Learning (HIL) from the vision track and Multimodal Knowledge Extraction (MKE) from the knowledge track. HIL can incrementally improve recognition ability of the system by learning new object samples and new object concepts. MKE is capable of constructing and updating the multimodal knowledge items based on the recognized new objects from HIL and other knowledge by exploring the multimodal signals. The fusion of the two tracks is a mutual promotion process and jointly devote to the dual track learning. We have conducted the experiments through human-machine interaction and the experimental results validated the effectiveness of our proposed system.

[1]  Peter Clark,et al.  Learning Knowledge Graphs for Question Answering through Conversational Dialog , 2015, NAACL.

[2]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[3]  S. Gotts Incremental learning of perceptual and conceptual representations and the puzzle of neural repetition suppression , 2016, Psychonomic bulletin & review.

[4]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[5]  Hema Swetha Koppula,et al.  RoboBrain: Large-Scale Knowledge Engine for Robots , 2014, ArXiv.

[6]  Matthieu Guillaumin,et al.  Incremental Learning of Random Forests for Large-Scale Image Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Geoffrey Zweig,et al.  Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding , 2014, INTERSPEECH.

[9]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[10]  S. Shan,et al.  VIPLFaceNet: an open source deep face recognition SDK , 2016, Frontiers of Computer Science.

[11]  D. Geary,et al.  Psychonomic Bulletin Review , 2000 .

[12]  Anthony G. Cohn,et al.  Natural Language Acquisition and Grounding for Embodied Robotic Systems , 2017, AAAI.

[13]  Peter Stone,et al.  Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[14]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[15]  Shimeng Yu,et al.  JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY , 2016, Journal of Computer Science and Technology.

[16]  Shuang Wang,et al.  RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion , 2015, Journal of Computer Science and Technology.

[17]  Chengqi Zhang,et al.  Conference on Neural Information Processing Systems , 2019 .

[18]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[19]  Moritz Tenorth,et al.  KnowRob: A knowledge processing infrastructure for cognition-enabled robots , 2013, Int. J. Robotics Res..

[20]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Joseph Y. Halpern A Computing Research Repository , 1998, D Lib Mag..

[23]  Ilja Kuzborskij,et al.  From N to N+1: Multiclass Transfer Incremental Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rodney D. Nielsen,et al.  Grounding the Meaning of Words through Vision and Interactive Gameplay , 2015, IJCAI.

[25]  Nicholas Roy,et al.  Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators , 2016, Robotics: Science and Systems.