Active Bayesian perception and reinforcement learning

In a series of papers, we have formalized an active Bayesian perception approach for robotics based on recent progress in understanding animal perception. However, an issue for applied robot perception is how to tune this method to a task, using: (i) a belief threshold that adjusts the speed-accuracy tradeoff; and (ii) an active control strategy for relocating the sensor e.g. to a preset fixation point. Here we propose that these two variables should be learnt by reinforcement from a reward signal evaluating the decision outcome. We test this claim with a biomimetic fingertip that senses surface curvature under uncertainty about contact location. Appropriate formulation of the problem allows use of multi-armed bandit methods to optimize the threshold and fixation point of the active perception. In consequence, the system learns to balance speed versus accuracy and sets the fixation point to optimize both quantities. Although we consider one example in robot touch, we expect that the underlying principles have general applicability.

[1]  Nathan F. Lepora,et al.  Brain-inspired Bayesian perception for biomimetic robot touch , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[3]  R. Duncan Luce,et al.  Response Times: Their Role in Inferring Elementary Mental Organization , 1986 .

[4]  Majid Nili Ahmadabadi,et al.  Online learning of task-driven object-based visual attention control , 2010, Image Vis. Comput..

[5]  Kevin Gurney,et al.  Optimal decision-making in mammals: insights from a robot study of rodent texture discrimination , 2012, Journal of The Royal Society Interface.

[6]  Bir Bhanu,et al.  Closed-loop object recognition using reinforcement learning , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Giorgio Metta,et al.  Embodied hyperacuity from Bayesian perception: Shape and position discrimination with an iCub fingertip sensor , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Nathan F. Lepora,et al.  Naive Bayes texture classification applied to whisker data from a moving robot , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[9]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[10]  Sethu Vijayakumar,et al.  Active estimation of object dynamics parameters with tactile sensors , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Giovanni Pezzulo,et al.  How can bottom-up information shape learning of top-down attention-control skills? , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[12]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[13]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[14]  Nathan F. Lepora,et al.  A SOLID Case for Active Bayesian Perception in Robot Touch , 2013, Living Machines.

[15]  Nathan F. Lepora,et al.  Whisker-object contact speed affects radial distance estimation , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[16]  Nathan F. Lepora,et al.  The effect of whisker movement on radial distance estimation: a case study in comparative robotics , 2013, Front. Neurorobot..

[17]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[18]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[19]  Giorgio Metta,et al.  Methods and Technologies for the Implementation of Large-Scale Robot Tactile Sensors , 2011, IEEE Transactions on Robotics.

[20]  G. Pezzulo,et al.  Integrating Reinforcement-Learning , Accumulator Models , and Motor-Primitives to Study Action Selection and Reaching in Monkeys , 2006 .

[21]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[22]  Kevin N. Gurney,et al.  The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions , 2007, Neural Computation.

[23]  Nathan F. Lepora,et al.  Active touch for robust perception under position uncertainty , 2013, 2013 IEEE International Conference on Robotics and Automation.

[24]  Bir Bhanu,et al.  Closed-Loop Object Recognition Using Reinforcement Learning , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Nathan F. Lepora,et al.  Active Bayesian Perception for Simultaneous Object Localization and Identification , 2013, Robotics: Science and Systems.

[26]  Joachim Denzler,et al.  A Framework for Actively Selecting Viewpoints in Object Recognition , 2009, Int. J. Pattern Recognit. Artif. Intell..

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.