论文信息 - Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

Leveraging Post Hoc Context for Faster Learning in Bandit Settings with Applications in Robot-Assisted Feeding

Autonomous robot-assisted feeding requires the ability to acquire a wide variety of food items. However, it is impossible for such a system to be trained on all types of food in existence. Therefore, a key challenge is choosing a manipulation strategy for a previously unseen food item. Previous work showed that the problem can be represented as a linear bandit with visual context. However, food has a wide variety of multi-modal properties relevant to manipulation that can be hard to distinguish visually. Our key insight is that we can leverage the haptic context we collect during and after manipulation (i.e., "post hoc") to learn some of these properties and more quickly adapt our visual model to previously unseen food. In general, we propose a modified linear contextual bandit framework augmented with post hoc context observed after action selection to empirically increase learning speed and reduce cumulative regret. Experiments on synthetic data demonstrate that this effect is more pronounced when the dimensionality of the context is large relative to the post hoc context or when the post hoc context model is particularly easy to learn. Finally, we apply this framework to the bite acquisition problem and demonstrate the acquisition of 8 previously unseen types of food with 21% fewer failures across 64 attempts.

[1] Kenneth S. Roberts,et al. Haptic object recognition using a multi-fingered dextrous hand , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[2] S D Prior. An electric wheelchair mounted robotic arm--a survey of potential users. , 1990, Journal of medical engineering & technology.

[3] P.N. Brett,et al. Research towards generalised robotic systems for handling non-rigid products , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[4] Sundaram Gunasekaran,et al. Shape feature extraction and classification of food material using computer vision , 1994 .

[5] William Harwin,et al. Devices for assisting manipulation: a summary of user task priorities , 1994 .

[6] J. M. Sharp,et al. Meeting the need for robotic handling of food products , 1997 .

[7] Mark H. Lee,et al. Teaching from examples in assembly and manipulation of snack food ingredients by robot , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[8] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[9] Da-Wen Sun,et al. Inspection and grading of agricultural and food products by computer vision systems—a review , 2002 .

[10] Darwin G. Caldwell,et al. Robotic manipulation of food products - a review , 2003, Ind. Robot.

[11] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[12] Da-Wen Sun,et al. Learning techniques used in computer vision for food quality evaluation: a review , 2006 .

[13] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[14] S. Takamuku,et al. Haptic discrimination of material properties by a robotic hand , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[15] Wolfram Burgard,et al. Object identification with tactile sensors using bag-of-features , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Wolfram Burgard,et al. Learning the elasticity parameters of deformable objects with a manipulation robot , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Yuta Sugiura,et al. Cooking with robots: designing a household system working in open environments , 2010, CHI.

[18] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[19] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[20] Dejan Pangercic,et al. Robotic roommates making pancakes , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[21] Zhuang Fu,et al. A Chinese cooking robot for elderly and disabled people , 2011, Robotica.

[22] Martin Mellado,et al. Review. Technologies for robot grippers in pick and place operations for fresh fruits and vegetables , 2011 .

[23] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[24] Jennifer Barry,et al. Bakebot: Baking Cookies with the PR2 , 2011 .

[25] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[26] Gert Kootstra,et al. Classification of rigid and deformable objects using a novel tactile sensor , 2011, 2011 15th International Conference on Advanced Robotics (ICAR).

[27] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[28] M. Brault. Americans with Disabilities: 2010 , 2012 .

[29] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[30] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.

[31] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[32] David Hsu,et al. Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[33] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[34] Gordon Cheng,et al. Humanoids learn object properties from robust tactile feature descriptors via multi-modal artificial skin , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[35] L. Zollo,et al. Soft Robotic Manipulation of Onions and Artichokes in the Food Industry , 2014 .

[36] Ashutosh Saxena,et al. Learning haptic representation for manipulating deformable food objects , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37] Ambuj Tewari,et al. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. , 2015, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[38] Siddhartha S. Srinivasa,et al. Robust trajectory selection for rearrangement planning as a multi-armed bandit problem , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39] Trevor Darrell,et al. Robotic learning of haptic adjectives through physical interaction , 2015, Robotics Auton. Syst..

[40] Jun Wang,et al. Portfolio Choices with Orthogonal Bandit Learning , 2015, IJCAI.

[41] Manuela M. Veloso,et al. Online Learning of Robot Soccer Free Kick Plans Using a Bandit Approach , 2016, ICAPS.

[42] Charles C. Kemp,et al. Towards Assistive Feeding with a General-Purpose Mobile Manipulator , 2016, ArXiv.

[43] Charles C. Kemp,et al. A CRF that combines touch and vision for haptic mapping , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44] Dmitry Berenson,et al. Bandit-Based Model Selection for Deformable Object Manipulation , 2016, WAFR.

[45] Moshe Tennenholtz,et al. Encouraging Physical Activity in Patients With Diabetes Through Automatic Personalized Feedback via Reinforcement Learning Improves Glycemic Control , 2016, Diabetes Care.

[46] Haipeng Luo,et al. Practical Contextual Bandits with Regression Oracles , 2018, ICML.

[47] Laura V. Herlant,et al. Algorithms, Implementation, and Studies on Eating with a Shared Control Robot Arm , 2018 .

[48] James M. Rehg,et al. Inferring Object Properties with a Tactile-Sensing Array Given Varying Joint Stiffness and Velocity , 2014, Int. J. Humanoid Robotics.

[49] Siddhartha S. Srinivasa,et al. Robot-Assisted Feeding: Generalizing Skewering Strategies across Food Items on a Realistic Plate , 2019, ArXiv.

[50] Nikos Vlassis,et al. Marginal Posterior Sampling for Slate Bandits , 2019, IJCAI.

[51] Siddhartha S. Srinivasa,et al. Transfer Depends on Acquisition: Analyzing Manipulation Strategies for Robotic Feeding , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[52] Siddhartha S. Srinivasa,et al. Towards Robotic Feeding: Role of Haptics in Fork-Based Food Manipulation , 2018, IEEE Robotics and Automation Letters.

[53] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[54] Tapomayukh Bhattacharjee,et al. Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously Unseen Food Items , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[55] Siddhartha S. Srinivasa,et al. Is More Autonomy Always Better?: Exploring Preferences of Users with Mobility Impairments in Robot-assisted Feeding , 2020, HRI.

[56] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Tor Lattimore,et al. Linear bandits with Stochastic Delayed Feedback , 2018, ICML.

[58] John Langford,et al. A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..