Policy search reinforcement learning for automatic wet clutch engagement

In most existing motion control algorithms, a reference trajectory is tracked, based on a continuous measurement of the system's response. In many industrial applications, however, it is either not possible or too expensive to install sensors which measure the system's output over the complete stroke: instead, the motion can only be detected at certain discrete positions. The control objective in these systems is often not to track a complete trajectory accurately, but rather to achieve a given state at the sensor locations (e.g. to pass by the sensor at a given time, or with a given speed). Model-based control strategies are not suited for the control of these systems, due to the lack of sensor data.We are currently investigating the potential of a non-model-based learning strategy, Reinforcement Learning (RL), in dealing with this kind of discrete sensor information. Here, we describe ongoing experiments with a wet clutch, which has to be engaged smoothly yet quickly, without any feedback on piston position.

[1]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[2]  W. Marsden I and J , 2012 .

[3]  Ann Nowé,et al.  Reinforcement learning for repetitive systems with discrete sensors , 2011 .

[4]  Julian Togelius,et al.  Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..

[5]  Timothy Gordon,et al.  Continuous action reinforcement learning applied to vehicle suspension control , 1997 .

[6]  Abdel Rodríguez,et al.  Continuous Action Reinforcement Learning Automata - Performance and Convergence , 2011, ICAART.

[7]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[9]  Robain De Keyser,et al.  An implementation of genetic-based learning classifier system on a wet clutch system , 2011 .

[10]  Neil Genzlinger A. and Q , 2006 .

[11]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[14]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[15]  Frank Sehnke,et al.  Multimodal Parameter-exploring Policy Gradients , 2010, 2010 Ninth International Conference on Machine Learning and Applications.