A unified model of the joint development of disparity selectivity and vergence control

Reinforcement learning is a prime candidate as a general mechanism to learn how to progressively choose behaviorally better options in animals and humans. An important problem is how the brain finds representations of relevant sensory input to use for such learning. Extensive empirical data have shown that such representations are also adapted throughout development. Thus, learning sensory representations for tasks and learning of task solutions occur simultaneously. Here we propose a novel framework for efficient coding and task learning in the full perception and action cycle and apply it to the learning of disparity representation for vergence eye movements. Our approach integrates learning of a generative model of sensory signals and learning of a behavior policy with the identical objective of making the generative model work as effectively as possible. We show that this naturally leads to a self-calibrating system learning to represent binocular disparity and produce accurate vergence eye movements. Our framework is very general and could be useful in explaining the development of various sensorimotor behaviors and their underlying representations.

[1]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[2]  Nathaniel D. Daw,et al.  Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..

[3]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[4]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[5]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[6]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[7]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[8]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[9]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[10]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[11]  J. Triesch,et al.  Emergence of Disparity Tuning during the Development of Vergence Eye Movements , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[12]  Jurgen Schmidhuber,et al.  Artificial curiosity with planning for autonomous perceptual and cognitive development , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[13]  Richard N. Aslin Anatomical Constraints on Oculomotor Development: Implications for Infant Perception , 2013 .

[14]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[15]  Jochen Triesch,et al.  Learning independent causes in natural images explains the spacevariant oblique effect , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[16]  Cornelius Weber,et al.  Goal-directed learning of features and forward models , 2009, Neural Networks.

[17]  R. Held,et al.  MOVEMENT-PRODUCED STIMULATION IN THE DEVELOPMENT OF VISUALLY GUIDED BEHAVIOR. , 1963, Journal of comparative and physiological psychology.

[18]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[19]  D. Hubel,et al.  Binocular interaction in striate cortex of kittens reared with artificial squint. , 1965, Journal of neurophysiology.

[20]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[21]  Bertram E. Shi,et al.  Joint development of disparity tuning and vergence control , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).