PlanIt: A crowdsourcing approach for learning to plan paths from large scale preference feedback

We consider the problem of learning user preferences over robot trajectories for environments rich in objects and humans. This is challenging because the criterion defining a good trajectory varies with users, tasks and interactions in the environment. We represent trajectory preferences using a cost function that the robot learns and uses it to generate good trajectories in new environments. We design a crowdsourcing system - PlanIt, where non-expert users label segments of the robot's trajectory. PlanIt allows us to collect a large amount of user feedback, and using the weak and noisy labels from PlanIt we learn the parameters of our model. We test our approach on 122 different environments for robotic navigation and manipulation tasks. Our extensive experiments show that the learned cost function generates preferred trajectories in human environments. Our crowdsourcing system is publicly available for the visualization of the learned costs and for providing preference feedback: http://planit.cs.cornell.edu.

[1]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[2]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[3]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[4]  Seth Hutchinson,et al.  Using manipulability to bias sampling during the construction of probabilistic roadmaps , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[5]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[6]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[7]  Wolfram Burgard,et al.  Learning Motion Patterns of People for Compliant Robot Motion , 2005, Int. J. Robotics Res..

[8]  Anthony Stentz,et al.  Anytime RRTs , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[10]  Hannes Bleuler,et al.  Randomised Rough-Terrain Robot Motion Planning , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Rachid Alami,et al.  Spatial reasoning for human robot interaction , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Maya Cakmak,et al.  To Afford or Not to Afford: A New Formalization of Affordances Toward Affordance-Based Robot Control , 2007, Adapt. Behav..

[13]  Rachid Alami,et al.  A Human Aware Mobile Robot Motion Planner , 2007, IEEE Transactions on Robotics.

[14]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[15]  Jing Xiao,et al.  Efficient and effective grasping of novel objects through learning and adapting a knowledge base , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[17]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Thierry Siméon,et al.  Sampling-Based Path Planning on Configuration-Space Costmaps , 2010, IEEE Transactions on Robotics.

[20]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[23]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[24]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[25]  Emre Ugur,et al.  Goal emulation and planning in perceptual space using learned affordances , 2011, Robotics Auton. Syst..

[26]  Pieter Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2010, Int. J. Robotics Res..

[27]  Rachid Alami,et al.  Planning human-aware motions using a sampling-based costmap planner , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[29]  Suvrit Sra,et al.  A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x) , 2012, Comput. Stat..

[30]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[31]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[32]  J. Andrew Bagnell,et al.  Efficient high dimensional maximum entropy modeling via symmetric partition functions , 2012, NIPS.

[33]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[34]  Martial Hebert,et al.  Contextual Sequence Prediction with Application to Control Library Optimization , 2012, Robotics: Science and Systems.

[35]  Rachid Alami,et al.  A Human-Aware Manipulation Planner , 2012, IEEE Transactions on Robotics.

[36]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  Ashutosh Saxena,et al.  Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback , 2016, ISRR.

[38]  Pieter Abbeel,et al.  Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization , 2013, Robotics: Science and Systems.

[39]  Bernhard Schölkopf,et al.  Probabilistic movement modeling for intention inference in human–robot interaction , 2013, Int. J. Robotics Res..

[40]  Sinan Kalkan,et al.  Learning Social Affordances and Using Them for Planning , 2013, CogSci.

[41]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[42]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[43]  Dmitry Berenson,et al.  Human-robot collaborative manipulation planning using early prediction of human motion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[45]  Maya Cakmak,et al.  Accelerating imitation learning through crowdsourcing , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Hema Swetha Koppula,et al.  RoboBrain: Large-Scale Knowledge Engine for Robots , 2014, ArXiv.

[47]  Siddhartha S. Srinivasa,et al.  A data-driven statistical framework for post-grasp manipulation , 2014, Int. J. Robotics Res..

[48]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.