Brain-inspired model for early vocal learning and correspondence matching using free-energy optimization

We propose a developmental model inspired by the cortico-basal system (CX-BG) for vocal learning in babies and for solving the correspondence mismatch problem they face when they hear unfamiliar voices, with different tones and pitches. This model is based on the neural architecture INFERNO standing for Iterative Free-Energy Optimization of Recurrent Neural Networks. Free-energy minimization is used for rapidly exploring, selecting and learning the optimal choices of actions to perform (eg sound production) in order to reproduce and control as accurately as possible the spike trains representing desired perceptions (eg sound categories). We detail in this paper the CX-BG system responsible for linking causally the sound and motor primitives at the order of a few milliseconds. Two experiments performed with a small and a large audio database show the capabilities of exploration, generalization and robustness to noise of our neural architecture in retrieving audio primitives during vocal learning and during acoustic matching with unheared voices (different genders and tones).

[1]  P. Kuhl Early language acquisition: cracking the speech code , 2004, Nature Reviews Neuroscience.

[2]  Bruno B Averbeck,et al.  Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. , 2005, Journal of neurophysiology.

[3]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[4]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[5]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[6]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[7]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[8]  Scott T. Grafton,et al.  The striatum: where skills and habits meet. , 2015, Cold Spring Harbor perspectives in biology.

[9]  G. Edelman,et al.  Spike-timing dynamics of neuronal groups. , 2004, Cerebral cortex.

[10]  Philippe Gaussier,et al.  Neural model for learning-to-learn of novel task sets in the motor domain , 2013, Front. Psychol..

[11]  Emmanuel Dupoux,et al.  Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.

[12]  Alexandre Pitti,et al.  Autonomous learning and chaining of motor primitives using the Free Energy Principle , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[13]  Minoru Asada,et al.  Efficient Reward-Based Learning through Body Representation in a Spiking Neural Network , 2018, 2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[14]  P. Kuhl Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not , 1991, Perception & psychophysics.

[15]  Yasuo Kuniyoshi,et al.  Modeling the cholinergic innervation in the infant cortico-hippocampal system and its contribution to early memory development and attention , 2011, The 2011 International Joint Conference on Neural Networks.

[16]  E. Koechlin Prefrontal executive function and adaptive behavior in complex environments , 2016, Current Opinion in Neurobiology.

[17]  Minoru Asada,et al.  Modeling Early Vocal Development Through Infant–Caregiver Interaction: A Review , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[18]  G. Buzsáki Rhythms of the brain , 2006 .

[19]  A. Graybiel The Basal Ganglia and Chunking of Action Repertoires , 1998, Neurobiology of Learning and Memory.

[20]  Ichiro Tsuda,et al.  Chaotic itinerancy and its roles in cognitive neurodynamics , 2015, Current Opinion in Neurobiology.

[21]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[22]  K. Doya,et al.  A unifying computational framework for motor control and social interaction. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[24]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[25]  Gianluca Baldassarre,et al.  What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[26]  Pierre-Yves Oudeyer,et al.  The Self-Organization of Speech Sounds , 2005, Journal of theoretical biology.

[27]  Eugene M. Izhikevich,et al.  Polychronization: Computation with Spikes , 2006, Neural Computation.

[28]  Jun Tani,et al.  Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes , 2005, Adapt. Behav..

[29]  Catherine Lavandier,et al.  Digital Neural Networks in the Brain: From Mechanisms for Extracting Structure in the World To Self-Structuring the Brain Itself , 2020, ArXiv.

[30]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[31]  Gergő Orbán,et al.  Representations of uncertainty in sensorimotor control , 2011, Current Opinion in Neurobiology.

[32]  R. Sutton,et al.  Reinforcement Learning in Artificial Intelligence , 1997 .

[33]  Yasuo Kuniyoshi,et al.  Cross-modal and scale-free action representations through enaction , 2009, Neural Networks.

[34]  Yasuo Kuniyoshi,et al.  Fusing autonomy and sociality via embodied emergence and development of behaviour and cognition from fetal period , 2019, Philosophical Transactions of the Royal Society B.

[35]  Sofiane Boucenna,et al.  Developmental Learning of Audio-Visual Integration From Facial Gestures Of a Social Robot , 2019 .

[36]  G. Davis,et al.  Current Opinion in Neurobiology 2011 , 2011 .

[37]  E. Miller,et al.  The “working” of working memory , 2013, Dialogues in clinical neuroscience.

[38]  Anne S. Warlaumont,et al.  Salience-based reinforcement of a spiking neural network leads to increased syllable production , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[39]  Bernd J. Kröger,et al.  Associative learning and self-organization as basic principles for simulating speech acquisition, speech production, and speech perception , 2014 .

[40]  Daniel Bullock,et al.  Computational modeling of stuttering caused by impairments in a basal ganglia thalamo-cortical circuit involved in syllable selection and initiation , 2013, Brain and Language.

[41]  A. Warlaumont,et al.  Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural Plasticity , 2016, PloS one.

[42]  Minoru Asada,et al.  Design and preliminary evaluation of the vocal cords and articulator of an infant-like vocal robot "Lingua" , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[43]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[44]  Konrad Paul Kording,et al.  Review TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Bayesian decision theory in sensorimotor control , 2022 .

[45]  B. Kröger,et al.  Emergence of an Action Repository as Part of a Biologically Inspired Model of Speech Processing: The Role of Somatosensory Information in Learning Phonetic-Phonological Sound Features , 2019, Front. Psychol..

[46]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[47]  Minoru Asada,et al.  Vowel Acquisition Based on an Auto-Mirroring Bias with a Less Imitative Caregiver , 2012, Adv. Robotics.

[48]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[49]  Pierre-Yves Oudeyer,et al.  Self-organization of early vocal development in infants and machines: the role of intrinsic motivation , 2014, Front. Psychol..

[50]  Philippe Gaussier,et al.  Gain-field modulation mechanism in multimodal networks for spatial perception , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[51]  Catherine Lavandier,et al.  Gated spiking neural network using Iterative Free-Energy Optimization and rank-order coding for structure learning in memory sequences (INFERNO GATE) , 2020, Neural Networks.

[52]  Michael W. Spratling Predictive coding as a model of cognition , 2016, Cognitive Processing.

[53]  Arnaud Delorme,et al.  Networks of integrate-and-fire neuron using rank order coding A: How to implement spike time dependent Hebbian plasticity , 2001, Neurocomputing.

[54]  Jean-Luc Schwartz,et al.  The Complementary Roles of Auditory and Motor Information Evaluated in a Bayesian Perceptuo-Motor Model of Speech Perception , 2017, Psychological review.

[55]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[56]  Trevor Bekolay,et al.  Reduction of dopamine in basal ganglia and its effects on syllable sequencing in speech: A computer simulation study , 2016 .

[57]  Jochen Triesch,et al.  Seeing [u] aids vocal learning: Babbling and imitation of vowels using a 3D vocal tract model, reinforcement learning, and reservoir computing , 2015, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[58]  M. Asada,et al.  Caregiver’s sensorimotor magnets lead infant’s vowel acquisition through auto mirroring , 2008, 2008 7th IEEE International Conference on Development and Learning.

[59]  Philippe Gaussier,et al.  Robot recognizing vowels in a multimodal way , 2019 .

[60]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[61]  Etienne Koechlin,et al.  Prefrontal function and cognitive control: from action to language , 2018, Current Opinion in Behavioral Sciences.

[62]  D. Buonomano,et al.  Complexity without chaos: Plasticity within random recurrent networks generates robust timing and motor control , 2012, 1210.2104.

[63]  Arnaud Delorme,et al.  Spike-based strategies for rapid processing , 2001, Neural Networks.

[64]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[65]  Philippe Gaussier,et al.  Spatio-Temporal Tolerance of Visuo-Tactile Illusions in Artificial Skin by Recurrent Neural Network with Spike-Timing-Dependent Plasticity , 2017, Scientific Reports.

[66]  James L. McClelland,et al.  Letting structure emerge: connectionist and dynamical systems approaches to cognition , 2010, Trends in Cognitive Sciences.

[67]  J. Tanji,et al.  Concept-based behavioral planning and the lateral prefrontal cortex , 2007, Trends in Cognitive Sciences.

[68]  Aren Jansen,et al.  Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline , 2013, INTERSPEECH.

[69]  A. Clark Surfing Uncertainty: Prediction, Action, and the Embodied Mind , 2015 .

[70]  Aren Jansen,et al.  The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[71]  Jean-Luc Schwartz,et al.  Computer simulations of coupled idiosyncrasies in speech perception and speech production with COSMO, a perceptuo-motor Bayesian model of speech communication , 2019, PloS one.

[72]  Stefan Wermter,et al.  Embodied Language Understanding with a Multiple Timescale Recurrent Neural Network , 2013, ICANN.

[73]  Tetsuya Ogata,et al.  Continuous vocal imitation with self-organized vowel spaces in Recurrent Neural Network , 2009, 2009 IEEE International Conference on Robotics and Automation.

[74]  Angelo Cangelosi,et al.  Speech and Language in Humanoid Robots , 2018, Humanoid Robotics: A Reference.

[75]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[76]  J. Tanji,et al.  Behavioral planning in the prefrontal cortex , 2001, Current Opinion in Neurobiology.

[77]  M. Asada,et al.  Unconscious anchoring in maternal imitation that helps find the correspondence of a caregiver's vowel categories , 2007, Adv. Robotics.

[78]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[79]  Alexandre Pitti,et al.  Iterative free-energy optimization for recurrent neural networks (INFERNO) , 2017, PloS one.

[80]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[81]  Yasuo Kuniyoshi,et al.  Contingency Perception and Agency Measure in Visuo-Motor Spiking Neural Networks , 2009, IEEE Transactions on Autonomous Mental Development.

[82]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[83]  E. Miller,et al.  Goal-direction and top-down control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.