论文信息 - Efficient Batch-Mode Reinforcement Learning Using Extreme Learning Machines

Efficient Batch-Mode Reinforcement Learning Using Extreme Learning Machines

As a class of batch-mode reinforcement learning (RL) methods for Markov decision problems with large or continuous state spaces, approximate policy iteration (API) has received increasing attention in the past decades. One open problem in the design of API algorithms is how to construct the basis functions or features for value function approximation (VFA). In this paper, we propose a novel batch-mode RL approach with randomly projected features for VFA. The proposed approach can be viewed as an extension of extreme learning machines (ELMs) to RL problems so it can be called ELM-API. The ELMs have been popularly studied in supervised learning problems, but there is not much work on the extension of ELMs to learning control problems. The proposed approach has advantages over the previous API algorithms in that the features for VFA can be quickly generated without complex parameter selection and the performance will be adaptive to different sample sets in batch-mode RL. In particular, the ELM-API approach can realize fast and efficient feature reconstruction when training sample sets are relatively small. Comprehensive simulation studies on two benchmark learning control problems were carried out to test the performance of API algorithms with different feature construction methods. It is shown that the ELM-API algorithm can obtain comparable or better performance than the previous API approaches. To further show the effectiveness of ELM-API in real-world applications, the simulation results on a more challenging high-dimensional lane-changing decision problem in dynamic traffic environment are also reported, which show the capability of the ELM-API algorithm in learning satisfactory lane-changing policies with high data efficiency.

[1] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[2] Yaonan Wang,et al. Autoencoder With Invertible Functions for Dimension Reduction and Image Reconstruction , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[4] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Dejan J. Sobajic,et al. Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[7] Xia Liu,et al. Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part I) , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[9] José David Martín-Guerrero,et al. Online fitted policy iteration based on extreme learning machines , 2016, Knowl. Based Syst..

[10] Shin'ichi Tamura,et al. Capabilities of a four-layered feedforward neural network: four layers versus three , 1997, IEEE Trans. Neural Networks.

[11] Shimon Whiteson,et al. Exploiting Best-Match Equations for Efficient Reinforcement Learning , 2011, J. Mach. Learn. Res..

[12] Xia Liu,et al. Is Extreme Learning Machine Feasible? A Theoretical Assessment (Part I) , 2015, IEEE Trans. Neural Networks Learn. Syst..

[13] Michael A. Goodrich,et al. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[14] Huaguang Zhang,et al. Fault-Tolerant Controller Design for a Class of Nonlinear MIMO Discrete-Time Systems via Online Reinforcement Learning Algorithm , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15] Yaonan Wang,et al. Bidirectional Extreme Learning Machine for Regression Problem and Its Learning Effectiveness , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16] Matthieu Geist,et al. Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[17] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[19] David Zhang,et al. Robust Visual Knowledge Transfer via Extreme Learning Machine-Based Domain Adaptation , 2016, IEEE Transactions on Image Processing.

[20] Peter Stone,et al. Characterizing reinforcement learning methods through parameterized learning problems , 2011, Machine Learning.

[21] André da Motta Salles Barreto,et al. Restricted gradient-descent algorithm for value-function approximation in reinforcement learning , 2008, Artif. Intell..

[22] Guang-Bin Huang,et al. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[23] Zongben Xu,et al. Dynamic Extreme Learning Machine and Its Approximation Capability , 2013, IEEE Transactions on Cybernetics.

[24] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[25] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[26] Chi-Man Vong,et al. Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[27] José David Martín-Guerrero,et al. Least-squares temporal difference learning based on extreme learning machine , 2014, ESANN.

[28] David Zhang,et al. Evolutionary Cost-Sensitive Extreme Learning Machine , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29] Marek Petrik,et al. Hybrid least-squares algorithms for approximate policy evaluation , 2009, Machine Learning.

[30] Badong Chen,et al. Extreme Learning Machine With Affine Transformation Inputs in an Activation Function , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[32] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[33] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35] Pedro Ferreira,et al. An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[36] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[37] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.

[38] Chee Kheong Siew,et al. Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[39] Derong Liu,et al. Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40] Yoh-Han Pao,et al. Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.

[41] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[42] Witold Pedrycz,et al. A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning , 2014, IEEE Transactions on Cybernetics.

[43] Guang-Bin Huang,et al. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions , 1998, IEEE Trans. Neural Networks.

[44] Ah Chung Tsoi,et al. Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[45] Yimin Yang,et al. Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning , 2016, IEEE Transactions on Cybernetics.

[46] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[47] Robert P. W. Duin,et al. Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[48] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[49] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[50] Xin Xu,et al. Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[51] Fuchun Sun,et al. Extreme Trust Region Policy Optimization for Active Object Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[52] Guang-Bin Huang,et al. Convex incremental extreme learning machine , 2007, Neurocomputing.

[53] Michael L. Littman,et al. Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[54] Hongming Zhou,et al. Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).