Intention Inference and Decision Making with Hierarchical Gaussian Process Dynamics Models

Anticipation is crucial for fluent human-robot interaction, which allows a robot to independently coordinate its actions with human beings in joint activities. An anticipatory robot relies on a predictive model of its human partners, and selects its own action according to the model's predictions. Intention inference and decision making are key elements towards such anticipatory robots. In this thesis, we present a machine-learning approach to intention inference and decision making, based on Hierarchical Gaussian Process Dynamics Models (H-GPDMs). We first introduce the H-GPDM, a class of generic latent-variable dynamics models. The H-GPDM represents the generative process of complex human movements that are directed by exogenous driving factors. Incorporating the exogenous variables in the dynamics model, the H-GPDM achieves improved interpretation, analysis, and prediction of human movements. While exact inference of the exogenous variables and the latent states is intractable, we introduce an approximate method using variational Bayesian inference, and demonstrate the merits of the H-GPDM in three different applications of human movement analysis. The H-GPDM lays a foundation for the following studies on intention inference and decision making. Intention inference is an essential step towards anticipatory robots. For this purpose, we consider a special case of the H-GPDM, the Intention-Driven Dynamics Model (IDDM), which considers the human partners' intention as exogenous driving factors. The IDDM is applicable to intention inference from observed movements using Bayes' theorem, where the latent state variables are marginalized out. As most robotics applications are subject to real-time constraints, we introduce an efficient online algorithm that allows for real-time intention inference. We show that the IDDM achieved state-of-the-art performance in intention inference using two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive robots. Decision making based on a time series of predictions allows a robot to be proactive in its action selection, which involves a trade-off between the accuracy and confidence of the prediction and the time for executing a selected action. To address the problem of action selection and optimal timing for initiating the movement, we formulate the anticipatory action selection using Partially Observable Markov Decision Process, where the H-GPDM is adopted to update belief state and to estimate transition model. We present two approaches to policy learning and decision making, and show their effectiveness using human-robot table tennis. In addition, we consider decision making solely based on the preference of the human partners, where observations are not sufficient for reliable intention inference. We formulate it as a repeated game and present a learning approach to safe strategies that exploit the humans' preferences. The learned strategy enables action selection when reliable intention inference is not available due to insufficient observation, e.g., for a robot to return served balls from a human table tennis player. In this thesis, we use human-robot table tennis as a running example, where a key bottleneck is the limited amount of time for executing a hitting movement. Movement initiation usually requires an early decision on the type of action, such as a forehand or backhand hitting movement, at least 80ms before the opponent has hit the ball. The robot, therefore, needs to be anticipatory and proactive of the opponent's intended target. Using the proposed methods, the robot can predict the intended target of the opponent and initiate an appropriate hitting movement according to the prediction. Experimental results show that the proposed intention inference and decision making methods can substantially enhance the capability of the robot table tennis player, using both a physically realistic simulation and a real Barrett WAM robot arm with seven degrees of freedom.

[1]  David J. Fleet Motion Models for People Tracking , 2011, Visual Analysis of Humans.

[2]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[3]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[4]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[5]  Guang-Hui Hsu,et al.  Optimal Stopping by Means of Point Process Observations with Applications in Reliability , 1993, Math. Oper. Res..

[6]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[7]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[8]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[9]  Uta K. Bindl,et al.  Making Things Happen: A Model of Proactive Motivation , 2010 .

[10]  Joshua B. Tenenbaum,et al.  Bayesian models of human action understanding , 2005, NIPS.

[11]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[12]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[13]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[14]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[15]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[16]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Carl E. Rasmussen,et al.  State-Space Inference and Learning with Gaussian Processes , 2010, AISTATS.

[19]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  H. Simon Bounded Rationality and Organizational Learning , 1991 .

[21]  David Hsu,et al.  Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.

[22]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[23]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[24]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[25]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[26]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[27]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[28]  Robert Rosen,et al.  Anticipatory systems : philosophical, mathematical, and methodological foundations , 1985 .

[29]  Shaul Markovitch,et al.  Learning and Exploiting Relative Weaknesses of Opponent Agents , 2005, Autonomous Agents and Multi-Agent Systems.

[30]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[31]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[32]  Juan A. Méndez,et al.  Ping-pong player prototype , 2003, IEEE Robotics Autom. Mag..

[33]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[34]  C. Urgesi,et al.  Action anticipation and motor resonance in elite basketball players , 2008, Nature Neuroscience.

[35]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[37]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[38]  Siddhartha S. Srinivasa,et al.  Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.

[39]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[40]  Fumio Miyazaki,et al.  A learning approach to robotic table tennis , 2005, IEEE Transactions on Robotics.

[41]  Nikos A. Vlassis,et al.  Planning with Continuous Actions in Partially Observable Environments , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[42]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[43]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[44]  Patrick Gallinari,et al.  Sequential approaches for learning datum-wise sparse representations , 2012, Machine Learning.

[45]  Martin V. Butz,et al.  The Challenge of Anticipation, A Unifying Framework for the Analysis and Design of Artificial Cognitive Systems , 2008, The Challenge of Anticipation.

[46]  D. Davies Psychological Factors in Competitive Sport , 1989 .

[47]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[48]  Christoph H. Lampert,et al.  Learning anticipation policies for robot table tennis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Marc Peter Deisenroth,et al.  Expectation Propagation in Gaussian Process Dynamical Systems , 2012, NIPS.

[50]  David J. Fleet,et al.  Multifactor Gaussian process models for style-content separation , 2007, ICML '07.

[51]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[52]  Sethu Vijayakumar,et al.  Latent spaces for dynamic movement primitives , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[53]  Mohammad Emtiyaz Khan,et al.  A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models , 2012, AISTATS.

[54]  Enlu Zhou,et al.  Optimal Stopping Under Partial Observation: Near-Value Iteration , 2013, IEEE Transactions on Automatic Control.

[55]  A. Williams,et al.  Anticipation skill in a real-world task: measurement, training, and transfer in tennis. , 2002, Journal of experimental psychology. Applied.

[56]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[57]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[58]  Odest Chadwicke Jenkins,et al.  Interactive Human Pose and Action Recognition Using Dynamical Motion Primitives , 2007, Int. J. Humanoid Robotics.

[59]  Kris K. Hauser,et al.  Recognition, prediction, and planning for assisted teleoperation of freeform tasks , 2012, Autonomous Robots.

[60]  Fumio Miyazaki,et al.  Learning to Dynamically Manipulate: A Table Tennis Robot Controls a Ball and Rallies with a Human Being , 2006 .

[61]  Andreas Schulze-Bonhage,et al.  Prediction of arm movement trajectories from ECoG-recordings in humans , 2008, Journal of Neuroscience Methods.

[62]  Robert J. Wood,et al.  Towards a 3g crawling robot through the integration of microrobot technologies , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[63]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[64]  M. A. Simon,et al.  Understanding Human Action: Social Explanation and the Vision of Social Science. , 1983 .

[65]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[67]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[68]  Anind K. Dey,et al.  Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.

[69]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[70]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[71]  Bernhard Schölkopf,et al.  Probabilistic movement modeling for intention inference in human–robot interaction , 2013, Int. J. Robotics Res..

[72]  Geoffrey E. Hinton,et al.  Two Distributed-State Models For Generating High-Dimensional Time Series , 2011, J. Mach. Learn. Res..

[73]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[75]  D.G. Tzikas,et al.  The variational approximation for Bayesian inference , 2008, IEEE Signal Processing Magazine.

[76]  Bernhard Schölkopf,et al.  Probabilistic Modeling of Human Movements for Intention Inference , 2012, Robotics: Science and Systems.

[77]  Shie Mannor,et al.  Activity Recognition with Mobile Phones , 2011, ECML/PKDD.

[78]  Christoph H. Lampert,et al.  Real-time detection of colored objects in multiple camera streams with off-the-shelf hardware components , 2012, Journal of Real-Time Image Processing.

[79]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[80]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[81]  Peter McCracken,et al.  Safe Strategies for Agent Modelling in Games , 2004, AAAI Technical Report.

[82]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[83]  Albert N. Shiryaev,et al.  Optimal Stopping Rules , 2011, International Encyclopedia of Statistical Science.

[84]  Uwe D. Hanebeck,et al.  Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[85]  Russell L. Anderson,et al.  A Robot Ping-Pong Player: Experiments in Real-Time Intelligent Control , 1988 .

[86]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[87]  H. Bekkering,et al.  Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[88]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[89]  L. Angel,et al.  RoboTenis: design, dynamic modeling and preliminary control , 2005, Proceedings, 2005 IEEE/ASME International Conference on Advanced Intelligent Mechatronics..

[90]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[91]  John T. Wen,et al.  A robot ping pong player: optimized mechanics, high performance 3D vision, and intelligent sensor control , 1990, Robotersysteme.

[92]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[93]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[94]  Nathan R. Sturtevant,et al.  Learning when to stop thinking and do something! , 2009, ICML '09.

[95]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[96]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[97]  Marc Toussaint,et al.  Modelling motion primitives and their timing in biologically executed movements , 2007, NIPS.

[98]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[99]  Stéphane Villeneuve,et al.  Investment Timing Under Incomplete Information , 2003, Math. Oper. Res..

[100]  Marion Alexander,et al.  TABLE TENNIS: A BRIEF OVERVIEW OF BIOMECHANICAL ASPECTS OF THE GAME FOR COACHES AND PLAYERS , 2009 .

[101]  De Xu,et al.  Adding Active Learning to LWR for Ping-Pong Playing Robot , 2013, IEEE Transactions on Control Systems Technology.

[102]  C. T. Farley,et al.  Energetics of walking and running: insights from simulated reduced-gravity experiments. , 1992, Journal of applied physiology.

[103]  Jan Peters,et al.  Balancing Safety and Exploitability in Opponent Modeling , 2011, AAAI.

[104]  De Xu,et al.  Control system design for a 5-DOF table tennis robot , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[105]  Kurt Helmes,et al.  A Variational Inequality Sufficient Condition for Optimal Stopping with Application to an Optimal Stock Selling Problem , 2006, SIAM J. Control. Optim..

[106]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[107]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[108]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[109]  Rajesh P. N. Rao,et al.  Gaze Following as Goal Inference: A Bayesian Model , 2011, CogSci.

[110]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[111]  Christian Laugier,et al.  Intentional motion on-line learning and prediction , 2008, Machine Vision and Applications.

[112]  G. Mazziotto Approximations of the optimal stopping problem in partial observation , 1986, Journal of Applied Probability.

[113]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[114]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[115]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[116]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[117]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[118]  Rui Li,et al.  Divide, Conquer and Coordinate: Globally Coordinated Switching Linear Dynamical System , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[119]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[120]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[121]  M. Deisenroth,et al.  A general perspective on Gaussian filtering and smoothing: Explaining current and deriving new algorithms , 2011, Proceedings of the 2011 American Control Conference.

[122]  Cynthia Breazeal,et al.  Cost-Based Anticipatory Action Selection for Human–Robot Fluency , 2007, IEEE Transactions on Robotics.

[123]  Christian Laugier,et al.  Growing Hidden Markov Models: An Incremental Tool for Learning and Predicting Human and Vehicle Motion , 2009, Int. J. Robotics Res..

[124]  Jeffrey V. Nickerson,et al.  Anticipatory systems: philosophical, mathematical, and methodological foundations , 2012, Int. J. Gen. Syst..

[125]  Carl E. Rasmussen,et al.  Robust Filtering and Smoothing with Gaussian Processes , 2012, IEEE Transactions on Automatic Control.

[126]  Yoav Shoham,et al.  A general criterion and an algorithmic framework for learning in multi-agent systems , 2007, Machine Learning.