Multi-armed recommendation bandits for selecting state machine policies for robotic systems

We investigate the problem of selecting a state-machine from a library to control a robot. We are particularly interested in this problem when evaluating such state machines on a particular robotics task is expensive. As a motivating example, we consider a problem where a simulated vacuuming robot must select a driving state machine well-suited for a particular (unknown) room layout. By borrowing concepts from collaborative filtering (recommender systems such as Netflix and Amazon.com), we present a multi-armed bandit formulation that incorporates recommendation techniques to efficiently select state machines for individual room layouts. We show that this formulation outperforms the individual approaches (recommendation, multi-armed bandits) as well as the baseline of selecting the `average best' state machine across all rooms.

[1]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[2]  Tucker Balch,et al.  Making a Clean Sweep: Behavior Based Vacuuming , 1993 .

[3]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[4]  Howie Choset,et al.  Coverage Path Planning: The Boustrophedon Cellular Decomposition , 1998 .

[5]  Sylvia C. Wong,et al.  A topological coverage algorithm for mobile robots , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[6]  Tanaka Fumihide,et al.  Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[9]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[11]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[12]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[13]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[14]  Yehuda Koren,et al.  Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[15]  Martial Hebert,et al.  Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Benjamin Rosman,et al.  A Multitask Representation Using Reusable Local Policy Templates , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[17]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[18]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .