Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration

In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces.

[1]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[2]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3]  Christos G. Cassandras,et al.  Discrete-Event Systems , 2005, Handbook of Networked and Embedded Control Systems.

[4]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[5]  Gavin Taylor,et al.  Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[6]  H. S. Wolff,et al.  iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression , 2022, Sensors.

[7]  Bo Liu,et al.  Basis Construction from Power Series Expansions of Value Functions , 2010, NIPS.

[8]  Larry A. Wasserman,et al.  Compressed and Privacy-Sensitive Sparse Regression , 2009, IEEE Transactions on Information Theory.

[9]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[10]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[11]  E. Nadaraya On Estimating Regression , 1964 .

[12]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[13]  Shuai Li,et al.  RETRACTED ARTICLE: A hierarchical learning architecture with multiple-goal representations and multiple timescale based on approximate dynamic programming , 2012, Neural Computing and Applications.

[14]  Alessandro Lazaric,et al.  LSTD with Random Projections , 2010, NIPS.

[15]  Yajin Zhou,et al.  SP-NN: A novel neural network approach for path planning , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[16]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[17]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[18]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[19]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[20]  Haibo He,et al.  Two-time-scale online actor-critic paradigm driven by POMDP , 2010, 2010 International Conference on Networking, Sensing and Control (ICNSC).

[21]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[22]  T. Jung,et al.  Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[23]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[24]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[25]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[26]  Avrim Blum,et al.  Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.