This paper introduces an approach to Q-learning algorithm with rough set theory introduced by Zdzislaw Pawlak in 1981. During Q-learning, an agent makes action selections in an effort to maximize a reward signal obtained from the environment. Based on reward, agent will make changes in its policy for future actions. The problem considered in this paper is the overestimation of expected value of cumulative future discounted rewards. This discounted reward is used in evaluating agent actions and policy during reinforcement learning. Due to the overestimation of discounted reward action evaluation and policy changes are not accurate. The solution to this problem results from a form Q-learning algorithm using a combination of approximation spaces and Q-learning to estimate the expected value of returns on actions. This is made possible by considering behavior patterns of an agent in scope of approximation spaces. The framework provided by an approximation space makes it possible to measure the degree that agent behaviors are a part of (''covered by'') a set of accepted agent behaviors that serve as a behavior evaluation norm.
[1]
James F. Peters,et al.
Rough Ethograms: Study of Intelligent System Behavior
,
2005,
Intelligent Information Systems.
[2]
Chris Gaskett,et al.
Q-Learning for Robot Control
,
2002
.
[3]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[4]
Sebastian Thrun,et al.
Issues in Using Function Approximation for Reinforcement Learning
,
1999
.
[5]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[6]
Peter Dayan,et al.
Technical Note: Q-Learning
,
2004,
Machine Learning.
[7]
James F. Peters,et al.
Rough Ethology: Towards a Biologically-Inspired Study of Collective Behavior in Intelligent Systems with Approximation Spaces
,
2005,
Trans. Rough Sets.