论文信息 - Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space – and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action space do not deal appropriately with such situations. The selective desensitization neural network (SDNN) has high generalization ability and robustness against noise and redundant input. We therefore propose an SDNNbased value-function approximator for Q-learning in continuous state-action space, and evaluate its performance in terms of robustness against redundant input and sensor noise. Results show that our proposal is strongly robust against noise and redundant input and enables the agent to take better actions by using additional inputs without degrading learning efficiency. These properties are eminently advantageous in realworld applications such as in robotic systems.

Takeshi Shibuya | Masahiko Morita | Takaaki Kobayashi

[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[2] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Takeshi Shibuya,et al. Q-learning in continuous state-action space with redundant dimensions by using a selective desensitization neural network , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[5] Takeshi Shibuya,et al. Complex-Valued Reinforcement Learning: a Context-Based Approach for POMDPs , 2011 .

[6] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .

[7] Jooyoung Park,et al. Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[8] Masahiko Morita,et al. Direct Estimation of Hand Motion Speed from Surface Electromyograms Using a Selective Desensitization Neural Network , 2014 .

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[11] Lionel Jouffe,et al. Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[12] Pierre Yves Glorennec,et al. Reinforcement Learning: an Overview , 2000 .

[13] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[14] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.