论文信息 - Multi-armed recommendation bandits for selecting state machine policies for robotic systems

Multi-armed recommendation bandits for selecting state machine policies for robotic systems

We investigate the problem of selecting a state-machine from a library to control a robot. We are particularly interested in this problem when evaluating such state machines on a particular robotics task is expensive. As a motivating example, we consider a problem where a simulated vacuuming robot must select a driving state machine well-suited for a particular (unknown) room layout. By borrowing concepts from collaborative filtering (recommender systems such as Netflix and Amazon.com), we present a multi-armed bandit formulation that incorporates recommendation techniques to efficiently select state machines for individual room layouts. We show that this formulation outperforms the individual approaches (recommendation, multi-armed bandits) as well as the baseline of selecting the `average best' state machine across all rooms.

[1] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[2] Tucker Balch,et al. Making a Clean Sweep: Behavior Based Vacuuming , 1993 .

[3] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[4] Howie Choset,et al. Coverage Path Planning: The Boustrophedon Cellular Decomposition , 1998 .

[5] Sylvia C. Wong,et al. A topological coverage algorithm for mobile robots , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[6] Tanaka Fumihide,et al. Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[9] Yehuda Koren,et al. Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[11] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[12] Yehuda Koren,et al. Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[13] Hui Li,et al. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[14] Yehuda Koren,et al. Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[15] Martial Hebert,et al. Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Benjamin Rosman,et al. A Multitask Representation Using Reusable Local Policy Templates , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[17] Yisong Yue,et al. Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.

[18] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .