Imitation Reinforcement Learning-Based Remote Rotary Inverted Pendulum Control in OpenFlow Network

Rotary inverted pendulum is an unstable and highly nonlinear device and has been used as a common application model in nonlinear control engineering field. In this paper, we use a rotary inverted pendulum as a deep reinforcement learning environment. The real device is composed of a cyber environment and physical environment based on the OpenFlow network, and the MQTT protocol is used on the Ethernet connection to connect the cyber environment and the physical environment. The reinforcement learning agent is learned to control the real device located remotely from the controller, and the classical PID controller is also utilized to implement the imitation reinforcement learning and facilitate the learning process. From our CPS-based experimental system, we verify that a deep reinforcement learning agent can successfully control the real device located remotely from the agent, and our imitation learning strategy can make the learning time reduced effectively.

[1]  Edward A. Lee Cyber Physical Systems: Design Challenges , 2008, 2008 11th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC).

[2]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[3]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[4]  Hari Om Gupta,et al.  Optimal control of nonlinear inverted pendulum dynamical system with disturbance input using PID controller & LQR , 2011, 2011 IEEE International Conference on Control System, Computing and Engineering.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Kamal Benzekki,et al.  Software-defined networking (SDN): a survey , 2016, Secur. Commun. Networks.

[7]  Wei-Song Lin,et al.  Adaptive critic motion controller based on sparse radial basis function network , 2008, 2008 World Automation Congress.

[8]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[9]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Juan Humberto Sossa Azuela,et al.  Reinforcement Learning Compensation based PD Control for Inverted Pendulum , 2018, 2018 15th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE).

[12]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[13]  Jim Esch,et al.  Software-Defined Networking: A Comprehensive Survey , 2015, Proc. IEEE.

[14]  Pedram Masajedi,et al.  The Optimization of a Full State Feedback Control System for a Model Helicopter for Longitudinal Movement , 2012 .

[15]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[16]  Manukid Parnichkun,et al.  Real-Time Optimal Control for Rotary Inverted Pendulum , 2009 .

[17]  Pedro Henrique Silva,et al.  IPERF tool: generation and evaluation of TCP and UDP data traffic , 2014 .

[18]  P. Balamuralidhar,et al.  Secure MQTT for Internet of Things (IoT) , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[19]  Pavlin Radoslavov,et al.  ONOS: towards an open, distributed SDN OS , 2014, HotSDN.

[20]  Liu Yongxin,et al.  Design of reinforce learning control algorithm and verified in inverted pendulum , 2015, 2015 34th Chinese Control Conference (CCC).