Contextual Online Learning Selection of Finite State Machines for Mobile Robots

With mobile robots used for individual purposes more and more commonly, they are more required to fit many kinds of complex environment and handle the big data scene. In this paper, we propose a contextual online learning algorithm with a ball partition method of context space, to solve the problem of selecting a suitable finite state machine at each kinds of regions for an individual mobile robot. By using historical records and adaptive partition in context space based on Multi-Arm Bandits, our algorithm will meet the needs of dealing large amounts of data and adapting to various complex environment for mobile robots. Our mathematical proof can prove that the accumulative regrets of our algorithm have a sublinear bound and simulation results show that our selection system can handle the case of big data and large amounts of complex environment.

[1]  Tanaka Fumihide,et al.  Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[2]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[3]  Rüdiger Dillmann,et al.  Probabilistic MDP-behavior planning for cars , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[4]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[6]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[7]  Markus Maurer,et al.  Probabilistic online POMDP decision making for lane changes in fully automated driving , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[8]  Tiziano Villa,et al.  Synthesis of Finite State Machines: Logic Optimization , 1997 .

[9]  Martin Pál,et al.  Contextual Multi-Armed Bandits , 2010, AISTATS.

[10]  Martial Hebert,et al.  Multi-armed recommendation bandits for selecting state machine policies for robotic systems , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Mihaela van der Schaar,et al.  Online Learning in Large-Scale Contextual Recommender Systems , 2016, IEEE Transactions on Services Computing.

[12]  R. Dillmann,et al.  Design of the planner of team AnnieWAY’s autonomous vehicle used in the DARPA Urban Challenge 2007 , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[13]  Rüdiger Dillmann,et al.  Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[14]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15]  P. W. Jones,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[16]  Edwin Olson,et al.  Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment , 2015, Autonomous Robots.

[17]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[18]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[19]  Leo Liberti,et al.  Bidirectional A* Search for Time-Dependent Fast Paths , 2008, WEA.

[20]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[21]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.