Learning Visual Object Categories for Robot Affordance Prediction

A fundamental requirement of any autonomous robot system is the ability to predict the affordances of its environment. The set of affordances define the actions that are available to the agent given the robot’s context. A standard approach to affordance learning is direct perception, which learns direct mappings from sensor measurements to affordance labels. For example, a robot designed for cross-country navigation could map stereo depth information and image features directly into predictions about the traversability of terrain regions. While this approach can succeed for a small number of affordances, it does not scale well as the number of affordances increases. In this paper, we show that visual object categories can be used as an intermediate representation that makes the affordance learning problem scalable. We develop a probabilistic graphical model which we call the Category—Affordance (CA) model, which describes the relationships between object categories, affordances, and appearance. This model casts visual object categorization as an intermediate inference step in affordance prediction. We describe several novel affordance learning and training strategies that are supported by our new model. Experimental results with indoor mobile robots evaluate these different strategies and demonstrate the advantages of the CA model in affordance learning, especially when learning from limited size data sets.

[1]  Anthony Stentz,et al.  Online adaptive rough-terrain navigation vegetation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[2]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[3]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  T. Minka Discriminative models, not discriminative training , 2005 .

[5]  Larry H. Matthies,et al.  Learning long-range terrain classification for autonomous navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  三嶋 博之 The theory of affordances , 2008 .

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Michael I. Jordan Graphical Models , 2003 .

[10]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[11]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[12]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[13]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[14]  James M. Rehg,et al.  Learning from examples in unstructured, outdoor environments , 2006, J. Field Robotics.

[15]  Vladimir Pavlovic,et al.  Boosted Bayesian network classifiers , 2008, Machine Learning.

[16]  Charles E. Thorpe,et al.  Vision-based neural network road and intersection detection and traversal , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[17]  Illah R. Nourbakhsh,et al.  Appearance-Based Obstacle Detection with Monocular Color Vision , 2000, AAAI/IAAI.

[18]  Christopher W. Geib,et al.  Object Action Complexes as an Interface for Planning and Robot Control , 2006 .

[19]  S. Ullman Against direct perception , 1980, Behavioral and Brain Sciences.

[20]  Gordon Cheng,et al.  Learning Similar Tasks From Observation and Practice , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[24]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[26]  Mark Steedman,et al.  Plans, Affordances, And Combinatory Grammar , 2002 .

[27]  Katsushi Ikeuchi,et al.  A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models , 2005, IEEE Transactions on Robotics.

[28]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[29]  James M. Rehg,et al.  Traversability classification using unsupervised on-line visual learning for outdoor robot navigation , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[30]  Ilkay Ulusoy,et al.  Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[33]  Lucas Paletta,et al.  Learning Predictive Features in Affordance based Robotic Perception Systems , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[35]  Giorgio Metta,et al.  Better Vision through Manipulation , 2003, Adapt. Behav..

[36]  Giorgio Metta,et al.  Early integration of vision and manipulation , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[37]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[38]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[39]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Pietro Perona,et al.  Learning slip behavior using automatic mechanical supervision , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[41]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[42]  Pietro Perona,et al.  Hybrid Generative-Discriminative Visual Categorization , 2008, International Journal of Computer Vision.

[43]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[44]  G. Aschersleben,et al.  The Theory of Event Coding (TEC): a framework for perception and action planning. , 2001, The Behavioral and brain sciences.

[45]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[46]  Michael T. Turvey,et al.  Gibsonian Affordances for Roboticists , 2007, Adapt. Behav..