Learning Hybrid Models to Control a Ball in a Circular Maze

This paper presents a problem of model learning to navigate a ball to a goal state in a circular maze environment with two degrees of freedom. Motion of the ball in the maze environment is influenced by several non-linear effects such as friction and contacts, which are difficult to model. We propose a hybrid model to estimate the dynamics of the ball in the maze based on Gaussian Process Regression equipped with basis functions obtained from physic first principles. The accuracy of the hybrid model is compared with standard algorithms for model learning to highlight its efficacy. The learned model is then used to design trajectories for the ball using a trajectory optimization algorithm. We also hope that the system presented in the paper can be used as a benchmark problem for reinforcement and robot learning for its interesting and challenging dynamics and its ease of reproducibility.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[4]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[5]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[6]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[7]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[10]  Javier R. Movellan,et al.  Semi-parametric Gaussian process for robot system identification , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Alessandro Chiuso,et al.  Online semi-parametric learning for inverse dynamics modeling , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  J. Andrew Bagnell,et al.  Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[14]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Shyju Susan Mathew,et al.  Implementation of optimal control for ball and beam system , 2016, 2016 International Conference on Emerging Technological Trends (ICETT).

[16]  Oliver Kroemer,et al.  Learning to predict phases of manipulation tasks as hidden states , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[18]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[19]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  A. D. Lewis,et al.  Geometric control of mechanical systems : modeling, analysis, and design for simple mechanical control systems , 2005 .

[21]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Christopher G. Atkeson,et al.  Policies based on trajectory libraries , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[23]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[24]  Jan Peters,et al.  Probabilistic segmentation applied to an assembly task , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[25]  Wisama Khalil,et al.  Model Identification , 2019, Springer Handbook of Robotics, 2nd Ed..

[26]  Martin A. Riedmiller,et al.  Approximate real-time optimal control based on sparse Gaussian process models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[27]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[28]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[29]  Inna Sharf,et al.  Literature survey of contact dynamics modelling , 2002 .

[30]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.