An Autonomous Developmental Cognitive Architecture Based on Incremental Associative Neural Network With Dynamic Audiovisual Fusion

Developing cognition is difficult to achieve yet crucial for robots. Infants can gradually improve their cognition through parental guidance and self-exploration. However, conventional learning methods for robots often focus on a single modality and train a pre-defined model by large datasets in an offline way. In this paper, we propose a hierarchical autonomous cognitive architecture for robots to learn object concepts online by interacting with humans. Two pathways for audio-visual information are devised. Each pathway has three layers based on the self-organizing incremental neural networks. Visual features and names of objects are incrementally learned and self-organized in an unsupervised way in sample layers, respectively, in which we propose a dynamically adjustable similarity threshold strategy to allow the network itself to control cluster rather than using a pre-defined threshold. Two symbol layers abstract the cluster results from the corresponding sample layer to form concise symbols and transmit them to an associative layer. An associative relationship between two modalities can be built in real time by binding activated visual and auditory symbols simultaneously in the associative layer. In this layer, a top-down response strategy is proposed to let robots autonomously recall another associative modality, solve conflicting associative relationships, and adjust learned knowledge from the top down. The experimental results on two objects datasets and a real task show that our architecture is efficient to learn and associate object view and name in an online way. What is more, the robot can autonomously improve its cognitive level by utilizing its own experience without enquiring with humans.

[1]  A. Caramazza,et al.  Nonvisual and Visual Object Shape Representations in Occipitotemporal Cortex: Evidence from Congenitally Blind and Sighted Adults , 2014, The Journal of Neuroscience.

[2]  Gi Hyun Lim,et al.  Towards lifelong assistive robotics: A tight coupling between object perception and manipulation , 2018, Neurocomputing.

[3]  Luís Seabra Lopes,et al.  Using spoken words to guide open-ended category formation , 2011, Cognitive Processing.

[4]  H. Bülthoff,et al.  Merging the senses into a robust percept , 2004, Trends in Cognitive Sciences.

[5]  J. C. Stanley Computer simulation of a model of habituation , 1976, Nature.

[6]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  B. Mesquita,et al.  Adjustment to Chronic Diseases and Terminal Illness Health Psychology : Psychological Adjustment to Chronic Disease , 2006 .

[9]  Patrícia Amâncio Vargas,et al.  Towards Autonomous Robots Via an Incremental Clustering and Associative Learning Architecture , 2014, Cognitive Computation.

[10]  Ah-Hwee Tan,et al.  Perception Coordination Network: A Framework for Online Multi-Modal Concept Acquisition and Binding , 2018, AAAI.

[11]  Michael Gasser,et al.  The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.

[12]  Osamu Hasegawa,et al.  Estimating multimodal attributes for unknown objects , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[13]  Angelo Cangelosi,et al.  Posture Affects How Robots and Infants Map Words to Objects , 2015, PloS one.

[14]  Naoyuki Kubota,et al.  Multi-channel Bayesian Adaptive Resonance Associate Memory for on-line topological map building , 2016, Appl. Soft Comput..

[15]  Stefan Wermter,et al.  Emergence of multimodal action representations from neural network self-organization , 2017, Cognitive Systems Research.

[16]  David A. Boas,et al.  Frontal Lobe Activation during Object Permanence: Data from Near-Infrared Spectroscopy , 2002, NeuroImage.

[17]  Ah-Hwee Tan,et al.  Encoding and Recall of Spatio-Temporal Episodic Memory in Real Time , 2017, IJCAI.

[18]  Shen Furao,et al.  A fast nearest neighbor classifier based on self-organizing incremental neural network , 2008, Neural Networks.

[19]  Shen Furao,et al.  An enhanced self-organizing incremental neural network for online unsupervised learning , 2007, Neural Networks.

[20]  Shen Furao,et al.  A Self-Organizing Incremental Neural Network based on local distribution learning , 2016, Neural Networks.

[21]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[22]  Shen Furao,et al.  An incremental network for on-line unsupervised classification and topology learning , 2006, Neural Networks.

[23]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[24]  Carol Nagy Jacklin Theories of development: Concepts and applications. , 1980 .

[25]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[26]  Mohammed Bennamoun,et al.  A deep neural network for audio-visual person recognition , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[27]  Vinay Jayaram,et al.  Speech-specific tuning of neurons in human superior temporal gyrus. , 2014, Cerebral cortex.

[28]  Ellen M. Markman,et al.  Constraints Children Place on Word Meanings , 1990, Cogn. Sci..

[29]  L. Lopes,et al.  Scaling Up Category Learning for Language Acquisition in Human-Robot Interaction , 2007 .

[30]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Kurosh Madani,et al.  A Soft-Computing basis for robots’ cognitive autonomous learning , 2015, Soft Comput..

[32]  A. Caramazza,et al.  White matter structural connectivity underlying semantic processing: evidence from brain damaged patients. , 2013, Brain : a journal of neurology.

[33]  Kun Li,et al.  Learn Like Infants: A Strategy for Developmental Learning of Symbolic Skills Using Humanoid Robots , 2015, International Journal of Social Robotics.

[34]  Shen Furao,et al.  A general associative memory based on self-organizing incremental neural network , 2013, Neurocomputing.

[35]  Luís Seabra Lopes,et al.  Acquiring Vocabulary through Human Robot Interaction: A Learning Architecture for Grounding Words with Multiple Meanings , 2010, AAAI Fall Symposium: Dialog with Robots.

[36]  Michal VavreckaIgor A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language , 2014 .

[37]  Karl J. Friston,et al.  A direct demonstration of functional specialization in human visual cortex , 1991, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  Séverin Lemaignan,et al.  Artificial cognition for social human-robot interaction: An implementation , 2017, Artif. Intell..

[39]  Luís Seabra Lopes,et al.  Hierarchical Object Representation for Open-Ended Object Category Learning and Recognition , 2016, NIPS.

[40]  Francisco Herrera,et al.  Cognitive Computing: Architecture, Technologies and Intelligent Applications , 2018, IEEE Access.

[41]  Larissa K. Samuelson,et al.  Statistical regularities in vocabulary guide language acquisition in connectionist models and 15-20-month-olds. , 2002, Developmental psychology.

[42]  Rafael Pérez y Pérez,et al.  Dev E-R: A computational model of early cognitive development as a creative process , 2015, Cognitive Systems Research.

[43]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[44]  Jong-Hwan Kim,et al.  Context preference-based deep adaptive resonance theory: Integrating user preferences into episodic memory encoding and retrieval , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[45]  Martin Jägersand,et al.  Incremental learning for robot perception through HRI , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Min Jiang,et al.  A developmental approach to robotic pointing via human-robot interaction , 2014, Inf. Sci..

[47]  Stefan Wermter,et al.  Self-organizing neural integration of pose-motion features for human action recognition , 2015, Front. Neurorobot..

[48]  Miguel Ángel Salichs,et al.  Sound Synthesis for Communicating Nonverbal Expressive Cues , 2017, IEEE Access.

[49]  T. Stanford,et al.  The neural basis of multisensory integration in the midbrain: Its organization and maturation , 2009, Hearing Research.

[50]  Angelo Cangelosi,et al.  Why Are There Developmental Stages in Language Learning? A Developmental Robotics Model of Language Development. , 2017, Cognitive science.