Model learning for robot control: a survey

Models are among the most essential tools in robotics, such as kinematics and dynamics models of the robot’s own body and controllable external objects. It is widely believed that intelligent mammals also rely on internal models in order to generate their actions. However, while classical robotics relies on manually generated models that are based on human insights into physics, future autonomous, cognitive robots need to be able to automatically generate models that are based on information which is extracted from the data streams accessible to the robot. In this paper, we survey the progress in model learning with a strong focus on robot control on a kinematic as well as dynamical level. Here, a model describes essential information about the behavior of the environment and the influence of an agent on this environment. In the context of model-based learning control, we view the model from three different perspectives. First, we need to study the different possible model learning architectures for robotics. Second, we discuss what kind of problems these architecture and the domain of robotics imply for the applicable learning methods. From this discussion, we deduce future directions of real-time learning algorithms. Third, we show where these scenarios have been used successfully in several case studies.

[1]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[2]  O. J. M. Smith,et al.  A controller to overcome dead time , 1959 .

[3]  R. Bellman Dynamic programming. , 1957, Science.

[4]  H. Akaike Autoregressive model fitting for control , 1971 .

[5]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[6]  Robin De Keyser,et al.  A self-tuning multistep predictor application , 1981, Autom..

[7]  Patrizio Tomei,et al.  Model reference adaptive control algorithms for industrial robots , 1984, Autom..

[8]  Christopher G. Atkeson,et al.  Estimation of Inertial Parameters of Manipulator Loads and Links , 1986 .

[9]  Francis L. Merat,et al.  Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[10]  K. Narendra,et al.  Persistent excitation in adaptive systems , 1987 .

[11]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[12]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[13]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[14]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[15]  Mitsuo Kawato,et al.  Feedback-error-learning neural network for trajectory control of a robotic manipulator , 1988, Neural Networks.

[16]  Edoardo Mosca,et al.  Robustness of multipredictor adaptive regulators: MUSMAR , 1988, Autom..

[17]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[18]  Mark W. Spong,et al.  Robot dynamics and control , 1989 .

[19]  W. Thomas Miller,et al.  Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..

[20]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[21]  Sheng Chen,et al.  Identification of MIMO non-linear systems using a forward-regression orthogonal estimator , 1989 .

[22]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[23]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[24]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[25]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[26]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[27]  Kenneth Kreutz-Delgado,et al.  Learning Global Direct Inverse Kinematics , 1991, NIPS.

[28]  João Miranda Lemos,et al.  A Long-Range Adaptive Controller for Robot Manipulators , 1991, Int. J. Robotics Res..

[29]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[30]  M. Gautier,et al.  Exciting Trajectories for the Identification of Base Inertial Parameters of Robots , 1991, [1991] Proceedings of the 30th IEEE Conference on Decision and Control.

[31]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[32]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[33]  Mitsuo Kawato,et al.  Recognition of manipulated objects by motor learning with modular architecture networks , 1991, Neural Networks.

[34]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[35]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[36]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[37]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[38]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[39]  Bruno Siciliano,et al.  Modeling and Control of Robot Manipulators , 1995 .

[40]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[41]  Jianqing Fan,et al.  Data‐Driven Bandwidth Selection in Local Polynomial Fitting: Variable Bandwidth and Spatial Adaptation , 1995 .

[42]  Kevin M. Passino,et al.  Fuzzy Model Reference Learning Control , 1996, J. Intell. Fuzzy Syst..

[43]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[44]  Etienne Burdet,et al.  Experiments in nonlinear adaptive control , 1997, Proceedings of International Conference on Robotics and Automation.

[45]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[46]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[47]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[48]  Jan Swevers,et al.  Optimal robot excitation and identification , 1997, IEEE Trans. Robotics Autom..

[49]  Kumpati S. Narendra,et al.  Adaptive control using multiple models , 1997, IEEE Trans. Autom. Control..

[50]  G. Terrell Statistical theory and computational aspects of smoothing , 1997 .

[51]  Reza Shadmehr,et al.  Evidence for a Forward Dynamics Model in Human Adaptive Motor Control , 1998, NIPS.

[52]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[53]  Shuzhi Sam Ge,et al.  Adaptive neural network control of flexible joint robots based on feedback linearization , 1998, Int. J. Syst. Sci..

[54]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[55]  David W. Clarke,et al.  Successive one-step-ahead predictions in multiple model predictive control , 1998, Int. J. Syst. Sci..

[56]  Alessandro De Luca,et al.  A general algorithm for dynamic feedback linearization of robots with elastic joints , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[57]  S. Schaal,et al.  Programmable Pattern Generators , 1998 .

[58]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[59]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[60]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[61]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[62]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[63]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[64]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[65]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[66]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[67]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[68]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[69]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[70]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[71]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[72]  Eric A. Wan,et al.  Model predictive neural control with applications to a 6 DOF helicopter model , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[73]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[74]  Ben J. A. Kröse,et al.  A probabilistic model for appearance-based robot localization , 2001, Image and Vision Computing.

[75]  Stefan Schaal,et al.  Biomimetic gaze stabilization based on feedback-error-learning with nonparametric regression networks , 2001, Neural Networks.

[76]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[77]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[78]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[79]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[80]  Ricardo O. Carelli,et al.  Neural networks for advanced control of robot manipulators , 2002, IEEE Trans. Neural Networks.

[81]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Dongbing Gu,et al.  Neural predictive control for a car-like mobile robot , 2002, Robotics Auton. Syst..

[83]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[84]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[85]  Gert Cauwenberghs,et al.  Silicon Support Vector Machine with On-Line Learning , 2003, Int. J. Pattern Recognit. Artif. Intell..

[86]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[87]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[88]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[89]  Wisama Khalil,et al.  Modeling, Identification and Control of Robots , 2003 .

[90]  Jun Morimoto,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[91]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[92]  James Theiler,et al.  Accurate On-line Support Vector Regression , 2003, Neural Computation.

[93]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[94]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[95]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[96]  Jun Nakanishi,et al.  Feedback error learning and nonlinear adaptive control , 2004, Neural Networks.

[97]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[98]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[99]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[100]  James V. Stone,et al.  Recurrent cerebellar architecture solves the motor-error problem , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[101]  J.J. Steil,et al.  Backpropagation-decorrelation: online recurrent learning with O(N) complexity , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[102]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[103]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[104]  Peter K. Allen,et al.  An SVM learning approach to robotic grasping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[105]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[106]  Jun Nakanishi,et al.  Composite adaptive control with locally weighted statistical learning , 2005, Neural Networks.

[107]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[108]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[109]  Wolfgang Maass,et al.  Movement Generation with Circuits of Spiking Neurons , 2005, Neural Computation.

[110]  Marc Toussaint,et al.  Learning discontinuities with products-of-sigmoids for switching between local models , 2005, ICML.

[111]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[112]  Bart De Schutter,et al.  Learning-based model predictive control for Markov decision processes , 2005 .

[113]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[114]  Wolfram Burgard,et al.  Heteroscedastic Gaussian Process Regression for Modeling Range Sensors in Mobile Robotics , 2005 .

[115]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[116]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[117]  Hannu T. Toivonen,et al.  A neural network model predictive controller , 2006 .

[118]  Jun Nakanishi,et al.  A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics , 2006, Robotics: Science and Systems.

[119]  Zhiyong Yang,et al.  Neural-Network Inverse Dynamic Online Learning Control on Physical Exoskeleton , 2006, ICONIP.

[120]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[121]  Helge J. Ritter,et al.  Dynamic Path Planning for a 7-DOF Robot Arm , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[122]  Pietro Perona,et al.  Slip Prediction Using Visual Information , 2006, Robotics: Science and Systems.

[123]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[124]  Marc Toussaint,et al.  Learning Multiple Models of Non-linear Dynamics for Control Under Varying Contexts , 2006, ICANN.

[125]  James J. Kuffner,et al.  Planning Among Movable Obstacles with Artificial Constraints , 2008, WAFR.

[126]  Nicolas Schweighofer,et al.  Local Online Support Vector Regression for Learning Control , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[127]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[128]  Joachim Hoffmann,et al.  Exploiting redundancy for flexible behavior: unsupervised learning in a modular sensorimotor control architecture. , 2007, Psychological review.

[129]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[130]  Stefan Schaal,et al.  Kernel Carpentry for Online Regression Using Randomly Varying Coefficient Model , 2007, IJCAI.

[131]  J.P. Ferreira,et al.  Simulation control of a biped robot with Support Vector Regression , 2007, 2007 IEEE International Symposium on Intelligent Signal Processing.

[132]  Manuel Lopes,et al.  A learning framework for generic sensory-motor maps , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[133]  Benjamin Schrauwen,et al.  An overview of reservoir computing: theory, applications and implementations , 2007, ESANN.

[134]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[135]  Joaquin Quiñonero-Candela,et al.  Large-Scale Kernel Machines , 2007 .

[136]  Jochen J. Steil,et al.  Online reservoir adaptation by intrinsic plasticity for backpropagation-decorrelation and echo state learning , 2007, Neural Networks.

[137]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[138]  M. Opper Sparse Online Gaussian Processes , 2008 .

[139]  Wolfram Burgard,et al.  Unsupervised body scheme learning through self-perception , 2008, 2008 IEEE International Conference on Robotics and Automation.

[140]  Wolfram Burgard,et al.  Learning predictive terrain models for legged robot locomotion , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[141]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[142]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[143]  R.F. Reinhart,et al.  Recurrent Neural Associative Learning of Forward and Inverse Kinematics for Movement Generation of the Redundant PA-10 Robot , 2008, 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems (LAB-RS).

[144]  Daniel H. Grollman,et al.  Sparse incremental learning for interactive robot control policy estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[145]  Stefan Schaal,et al.  Bayesian Kernel Shaping for Learning Control , 2008, NIPS.

[146]  Jun Nakanishi,et al.  A Unifying Methodology for Robot Control with Redundant DOFs , 2008 .

[147]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[148]  Jun Nakanishi,et al.  A unifying framework for robot control with redundant DOFs , 2007, Auton. Robots.

[149]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[150]  Jochen J. Steil,et al.  Attractor-based computation with reservoirs for online learning of inverse kinematics , 2009, ESANN.

[151]  Olivier Sigaud,et al.  Control of redundant robots using learned models: An operational space control approach , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[152]  Wolfram Burgard,et al.  Adaptive autonomous control using online value iteration with gaussian processes , 2009, 2009 IEEE International Conference on Robotics and Automation.

[153]  Stefan Ulbrich,et al.  Rapid learning of humanoid body schemas with Kinematic Bézier Maps , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[154]  Jochen J. Steil,et al.  Efficient exploration and learning of whole body kinematics , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[155]  Robert N. K. Loh,et al.  Model Reference Adaptive Control for Actuators of a Biped Robot Locomotion , 2009 .

[156]  Eric L. Sauser,et al.  A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation , 2009 .

[157]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[158]  Jochen J. Steil,et al.  Reaching movement generation with a recurrent neural network based on learning inverse kinematics for the humanoid robot iCub , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[159]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[160]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[161]  Stefan Schaal,et al.  Learning locomotion over rough terrain using terrain templates , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[162]  S. Vijayakumar,et al.  Realising Dextrous Manipulation with Structured Manifolds using Unsupervised Kernel Regression with Structural Hints , 2009 .

[163]  Stefan Schaal,et al.  Local Dimensionality Reduction for Non-Parametric Regression , 2009, Neural Processing Letters.

[164]  Oliver Kroemer,et al.  Active learning using mean shift optimization for robot grasping , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[165]  Jan Peters,et al.  Incremental Sparsification for Real-time Online Model Learning , 2010, AISTATS.

[166]  Bernt Schiele,et al.  Multi-modal Learning , 2010, Cognitive Systems.

[167]  Marek Sewer Kopicki,et al.  Prediction learning in robotic manipulation , 2010 .

[168]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[169]  Martin V. Butz,et al.  The SURE_REACH Model for Motor Learning and Control of a Redundant Arm: From Modeling Human Behavior to Applications in Robotics , 2010, From Motor Learning to Interaction Learning in Robots.

[170]  Manuel Lopes,et al.  Body schema acquisition through active learning , 2010, 2010 IEEE International Conference on Robotics and Automation.

[171]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[172]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[173]  Alejandro Hernández Arieta,et al.  Body Schema in Robotics: A Review , 2010, IEEE Transactions on Autonomous Mental Development.

[174]  Jochen J. Steil,et al.  Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[175]  Stefan Schaal,et al.  Learning Control in Reobotics: Trajectory-Based Optimal Control Techniques , 2010 .

[176]  Stefan Schaal,et al.  Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[177]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[178]  Jan Peters,et al.  Incremental online sparsification for model learning in real-time robot control , 2011, Neurocomputing.

[179]  Ravi Vaidyanathan,et al.  IEEE International Conference on Intelligent Robots and Systems , 2011, IROS 2011.

[180]  Rustam Stolkin,et al.  Learning to predict how rigid objects behave under simple manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[181]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .