论文信息 - Efficient Value Function Approximation Using Regression Trees

Efficient Value Function Approximation Using Regression Trees

Value function approximation is a problem central to reinforcement learning. Many applications of reinforcement learning have relied on neural network function approximators, which are very slow to train and require substantial parameter tweaking to obtain good performance. Other reinforcement learning studies have applied nearest neighbor and CMAC function ap-proximators, but these cannot scale to problems with many features, especially if some features are irrelevant. We describe initial work on a new function approximation method that uses regression trees to represent value functions. A novel aspect of our method is its error criterion , which combines three terms: the supervised training error, a Bellman error term, and an advantage error term. By using this composite error criterion, we are able to combine many of the beneets of tted value iteration, T D(0), and advantage updating. The new method is compared experimentally to previous work that employed T D() to solve job-shop scheduling problems (Zhang & Dietterich, 1996). The results show that the new method performs as well as the neural network method employed in that work, and that it can be trained in much less time. Our new method shows promise of providing a function approximator that is much more eecient and much easier to apply than neural network methods.

Thomas G. Dietterich | Xin Wang

[1] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[2] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[3] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .

[4] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[5] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[6] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[8] Geoffrey E. Hinton,et al. Using Pairs of Data-Points to Define Splits for Decision Trees , 1995, NIPS.

[9] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[10] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[11] Andrew W. Moore,et al. Learning Evaluation Functions for Large Acyclic Domains , 1996, ICML.

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] Wei Zhang,et al. Reinforcement learning for job shop scheduling , 1996 .