Approximate Policy Iteration using Large-Margin Classifiers

We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains.

[1]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[2]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[3]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[4]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Remco C. Veltkamp,et al.  Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[8]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[9]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[11]  A. Tversky Features of Similarity , 1977 .

[12]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[13]  Edward Y. Chang,et al.  Learning with non-metric proximity matrices , 2005, MULTIMEDIA '05.

[14]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[15]  Klaus-Robert Müller,et al.  Feature Discovery in Non-Metric Pairwise Data , 2004, J. Mach. Learn. Res..

[16]  Hermann Ney,et al.  Adaptation in statistical pattern recognition using tangent vectors , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[18]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[19]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[20]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[22]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[23]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[24]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.