论文信息 - Learning control of a bioreactor system using kernel-based heuristic dynamic programming

Learning control of a bioreactor system using kernel-based heuristic dynamic programming

To solve the learning control problem of a bioreactor system, a novel framework of heuristic dynamic programming (HDP) with sparse kernel machines is presented, which integrates kernel methods into critic learning of HDP. As a class of adaptive critic designs (ACDs), HDP has been used to realize online learning control of dynamical systems, where neural networks are commonly employed to approximate the value functions or policies. However, there are still some difficulties in the design and implementation of HDP such as that the learning efficiency and convergence of HDP greatly rely on the empirical design of the critic and so on. In this paper, by using the sparse kernel machines, Kernel HDP (KHDP) is proposed and its performance is analyzed both theoretically and empirically. Due to the representation learning and nonlinear approximation ability of sparse kernel machines, KHDP can obtain better performance than previous HDP method with manually designed neural networks. Simulation results demonstrate the effectiveness of the proposed method.

Chuanqiang Lian | Xin Xu | Zhenhua Huang | Lei Zuo

[1] Dewen Hu,et al. Continuous-action reinforcement learning with fast policy search and adaptive basis function selection , 2011, Soft Comput..

[2] A. P. Jagadeesh Chandra,et al. Web-Based Collaborative Learning Architecture for Remote Experiment on Control of Bioreactor's Environment , 2009, J. Softw..

[3] Emil Petre,et al. Sliding mode and adaptive sliding‐mode control of a class of nonlinear bioprocesses , 2007 .

[4] Alexander J. Smola,et al. Learning with kernels , 1998 .

[5] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[6] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[7] Xin Xu,et al. Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies , 2010, Appl. Soft Comput..

[8] Lyle H. Ungar,et al. A bioreactor benchmark for adaptive network-based process control , 1990 .

[9] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[10] Bartolomeo Cosenza,et al. Control of a nonlinear continuous bioreactor with bifurcation by a type-2 fuzzy logic controller , 2008, Comput. Chem. Eng..

[11] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..

[12] M O Efe. MIMO variable structure controller design for a bioreactor benchmark process. , 2007, ISA transactions.

[13] Simon X. Yang,et al. Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition , 2011, IEEE Transactions on Neural Networks.

[14] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[15] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[16] Jay H. Lee,et al. Optimal control of a fed-batch bioreactor using simulation-based approximate dynamic programming , 2005, IEEE Transactions on Control Systems Technology.

[17] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] L. Arike,et al. Single bioreactor gastrointestinal tract simulator for study of survival of probiotic bacteria , 2008, Applied Microbiology and Biotechnology.

[20] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[21] H. Ngo,et al. Roles of sponge sizes and membrane types in a single stage sponge-submerged membrane bioreactor for improving nutrient removal from wastewater for reuse , 2009 .