论文信息 - Machine learning through exploration for perception-driven robotics = Machinelles Lernen in der Perzeptions-basierte Robotik

Machine learning through exploration for perception-driven robotics = Machinelles Lernen in der Perzeptions-basierte Robotik

The ability of robots to perform tasks in human environments has largely been limited to rather simple and specific tasks, such as lawn mowing and vacuum cleaning. As such, current robots are far away from the robot butlers, assistants, and housekeepers that are depicted in science fiction movies. Part of this gap can be explained by the fact that human environments are hugely varied, complex and unstructured. For example, the homes that a domestic robot might end up in are hugely varied. Since every home has a different layout with different objects and furniture, it is impossible for a human designer to anticipate all challenges a robot might face, and equip the robot a priori with all the necessary perceptual and manipulation skills. Instead, robots could be programmed in a way that allows them to adapt to any environment that they are in. In that case, the robot designer would not need to precisely anticipate such environments. The ability to adapt can be provided by robot learning techniques, which can be applied to learn skills for perception and manipulation. Many of the current robot learning techniques, however, rely on human supervisors to provide annotations or demonstrations, and to fine-tuning the methods parameters and heuristics. As such, it can require a significant amount of human time investment to make a robot perform a task in a novel environment, even if statistical learning techniques are used. In this thesis, I focus on another way of obtaining the data a robot needs to learn about the environment and how to successfully perform skills in it. By exploring the environment using its own sensors and actuators, rather than passively waiting for annotations or demonstrations, a robot can obtain this data by itself. I investigate multiple approaches that allow a robot to explore its environment autonomously, while trying to minimize the design effort required to deploy such algorithms in different situations. First, I consider an unsupervised robot with minimal prior knowledge about its environment. It can only learn through observed sensory feedback obtained though interactive exploration of its environment. In a bottom-up, probabilistic approach, the robot tries to segment the objects in its environment through clustering with minimal prior knowledge. This clustering is based on static visual scene features and observed movement. Information theoretic principles are used to autonomously select actions that maximize the expected information gain, and thus learning speed. Our evaluations on a real robot system equipped with an on-board camera show that the proposed method handles noisy inputs better than previous methods, and that action selection according to the information gain criterion does increase the learning speed. Often, however, the goal of a robot is not just to learn the structure of the environment, but to learn how to perform a task encoded by a reward signal. In addition to the weak feedback provided by reward signals, the robot has access to rich sensory data, that, even for simple tasks, is often non-linear and high-dimensional. Sensory data can be leveraged to learn a system model, but in high-dimensional sensory spaces this step often requires manually designing features. I propose a robot reinforcement learning algorithm with learned non-parametric models, value functions, and policies that can deal with high-dimensional state representations. As such, the proposed algorithm is well-suited to deal with high-dimensional signals such as camera images. To avoid that the robot converges prematurely to a sub-optimal solution, the information loss of policy updates is limited. This constraint makes sure the robot keeps exploring the effects of its behavior on the environment. The experiments show that the proposed non-parametric relative entropy policy search algorithm performs better than prior methods that either do not employ bounded updates, or that try to cover the state-space with general-purpose radial basis functions. Furthermore, the method is validated on a real-robot setup with high-dimensional camera image inputs. One problem with typical exploration strategies is that the behavior is perturbed independently in each time step, for example through selecting a random action or random policy parameters. As such, the resulting exploration behavior might be incoherent. Incoherence causes inefficient random walk behavior, makes the system less robust, and causes wear and tear on the robot. A typical solution is to perturb the policy parameters directly, and use the same perturbation for an entire episode. However, this strategy tends to increase the number of episodes needed, since only a single perturbation can be evaluated per episode. I introduce a strategy that can make a more balanced trade-off between the advantages of these two approaches. The experiments show that intermediate trade-offs, rather than independent or episode-based exploration, is beneficial across different tasks and learning algorithms. This thesis thus addresses how robots can learn autonomously by exploring the world through unsupervised learning and reinforcement learning. Throughout the thesis, new approaches and algorithms are introduced: a probabilistic interactive segmentation approach, the non-parametric relative entropy policy search algorithm, and a framework for generalized exploration. To allow the learning algorithms to be applied in different and unknown environments, the design effort and supervision required from human designers or users is minimized. These approaches and algorithms contribute towards the capability of robots to autonomously learn useful skills in human environments in a practical manner.

Herke van Hoof | H. V. Hoof

[1] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[2] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3] Peter Englert,et al. Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[4] Ngo Anh Vien,et al. Touch based POMDP manipulation via sequential submodular optimization , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[5] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[6] Marc Toussaint,et al. Active Learning of Hyperparameters: An Expected Cross Entropy Criterion for Active Model Selection , 2014, ArXiv.

[7] Neil D. Lawrence,et al. Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[8] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[9] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[10] Jürgen Beyerer,et al. Bayesian active object recognition via Gaussian process regression , 2012, 2012 15th International Conference on Information Fusion.

[11] Jason Pazis,et al. Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.

[12] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[13] Joachim Denzler,et al. Information Theoretic Sensor Data Selection for Active Object Recognition and State Estimation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[15] Joelle Pineau,et al. Bellman Error Based Feature Generation using Random Projections on Sparse Spaces , 2013, NIPS.

[16] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[17] Darwin G. Caldwell,et al. Direct policy search reinforcement learning based on particle filtering , 2012, EWRL 2012.

[18] Dieter Fox,et al. Autonomous generation of complete 3D object models using next best view manipulation planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[20] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[21] D. Aldous. Exchangeability and related topics , 1985 .

[22] Brian Kingsbury,et al. A comparison between deep neural nets and kernel acoustic models for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Oliver Kroemer,et al. Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments , 2014, IEEE Transactions on Robotics.

[24] Oleg O. Sushkov,et al. Feature Segmentation for Object Recognition Using Robot Manipulation , 2011 .

[25] Jan Peters,et al. Learning of Non-Parametric Control Policies with High-Dimensional State Features , 2015, AISTATS.

[26] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[27] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[28] Ales Ude,et al. Enhanced Policy Adaptation Through Directed Explorative Learning , 2015, Int. J. Humanoid Robotics.

[29] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[31] Carl E. Rasmussen,et al. Policy search for learning robot control using sparse data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.

[33] Oliver Brock,et al. Online interactive perception of articulated objects with multi-level recursive estimation based on task-specific priors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34] Oliver Kroemer,et al. A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.

[35] Niklas Bergström,et al. Scene Understanding through Autonomous Interactive Perception , 2011, ICVS.

[36] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.

[37] Chrystopher L. Nehaniv,et al. All Else Being Equal Be Empowered , 2005, ECAL.

[38] Jun Morimoto,et al. Integrating visual perception and manipulation for autonomous learning of object representations , 2013, Adapt. Behav..

[39] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[40] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[41] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.

[42] Ales Ude,et al. Physical interaction for segmentation of unknown textured and non-textured rigid objects , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[43] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[44] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[45] Leslie Pack Kaelbling,et al. Robust grasping under object pose uncertainty , 2011, Auton. Robots.

[46] Jan Peters,et al. Policy Learning - A Unified Perspective with Applications in Robotics , 2008, EWRL.

[47] Chris Watkins,et al. Sex as Gibbs Sampling: a probability model of evolution , 2014, ArXiv.

[48] Jan Peters,et al. Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.

[50] Klaus Obermayer,et al. Construction of approximation spaces for reinforcement learning , 2013, J. Mach. Learn. Res..

[51] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[52] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[53] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[54] Giulio Sandini,et al. Exploring the world through grasping: a developmental approach , 2005, 2005 International Symposium on Computational Intelligence in Robotics and Automation.

[55] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[56] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[57] Dieter Fox,et al. Interactive singulation of objects from a pile , 2012, 2012 IEEE International Conference on Robotics and Automation.

[58] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[59] Martin A. Riedmiller,et al. Learn to Swing Up and Balance a Real Pole Based on Raw Visual Input Data , 2012, ICONIP.

[60] Oliver Brock,et al. Coupled learning of action parameters and forward models for manipulation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[62] Johannes Fürnkranz,et al. Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.

[63] Kenji Fukumizu,et al. Hilbert Space Embeddings of POMDPs , 2012, UAI.

[64] David G. Lowe,et al. Probabilistic Models of Appearance for 3-D Object Recognition , 2000, International Journal of Computer Vision.

[65] Justin Bayer,et al. Efficient movement representation by embedding Dynamic Movement Primitives in deep autoencoders , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[66] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[67] Oliver Brock,et al. Interactive segmentation for manipulation in unstructured environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[68] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[69] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[70] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[71] Jeremy Wyatt,et al. Exploration and inference in learning from reinforcement , 1998 .

[72] Edwin Olson,et al. Graph-based segmentation for colored 3D laser point clouds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[73] Wolfram Burgard,et al. Object identification with tactile sensors using bag-of-features , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74] Zoltan-Csaba Marton,et al. Tracking-based interactive segmentation of textureless objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[75] Byron Boots,et al. Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[76] Jan Peters,et al. Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[77] Andrew Blake,et al. Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[78] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[79] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[80] Per-Erik Forssén,et al. Maximally Stable Colour Regions for Recognition and Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[81] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[82] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[83] Oleg O. Sushkov,et al. Active robot learning of object properties , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[84] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[85] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.

[86] Minsu Cho,et al. Co-recognition of Image Pairs by Data-Driven Monte Carlo Image Exploration , 2008, ECCV.

[87] Joni Pajarinen,et al. Decision making under uncertain segmentations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[88] Oliver Brock,et al. Learning state representations with robotic priors , 2015, Auton. Robots.

[89] Pejman Iravani,et al. Probabilistic models for robot-based object segmentation , 2011, Robotics Auton. Syst..

[90] Masashi Sugiyama,et al. Policy Search with High-Dimensional Context Variables , 2016, AAAI.

[91] Justus H. Piater,et al. Development of Object and Grasping Knowledge by Robot Exploration , 2010, IEEE Transactions on Autonomous Mental Development.

[92] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[93] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[94] Jan Peters,et al. Stabilizing novel objects by learning to predict tactile slip , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[95] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[96] Oliver Brock,et al. Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[97] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[98] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[99] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[100] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[101] Guy Lever,et al. Modelling Policies in MDPs in Reproducing Kernel Hilbert Space , 2015, AISTATS.

[102] Wai Ho Li,et al. Interactive learning of visually symmetric objects , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[103] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[104] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[105] Jan Peters,et al. Learning a Multi-Task Model for Control of a Compliant Robot , 2016 .

[106] Jan Peters,et al. Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[107] Oliver Brock,et al. Entropy-based strategies for physical exploration of the environment's degrees of freedom , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[108] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[109] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[110] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[111] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[112] Cristian Sminchisescu,et al. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[114] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[115] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[116] Peter L. Bartlett,et al. An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.

[117] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[118] Cristina P. Santos,et al. Using Cost-regularized Kernel Regression with a high number of samples , 2014, 2014 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC).

[119] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[120] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[121] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[122] Wai Ho Li,et al. Autonomous segmentation of Near-Symmetric objects through vision and robotic nudging , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[123] Oliver Kroemer,et al. Maximally informative interaction learning for scene exploration , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[124] Frank Dellaert,et al. Planar Segmentation of RGBD Images Using Fast Linear Fitting and Markov Chain Monte Carlo , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[125] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.

[126] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[127] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[128] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[129] Peter Englert,et al. Policy Search in Reproducing Kernel Hilbert Space , 2016, IJCAI.

[130] Mohammed Bennamoun,et al. Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[132] Nando de Freitas,et al. Bayesian Policy Learning with Trans-Dimensional MCMC , 2007, NIPS.

[133] Jan Peters,et al. Non-parametric Policy Search with Limited Information Loss , 2017, J. Mach. Learn. Res..

[134] Martin A. Riedmiller,et al. Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[135] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[136] Gordon Cheng,et al. Making Object Learning and Recognition an Active Process , 2008, Int. J. Humanoid Robotics.

[137] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.

[138] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[139] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[140] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[141] Oliver Brock,et al. Interactive Perception of Articulated Objects , 2010, ISER.

[142] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .

[143] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[144] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[145] André da Motta Salles Barreto,et al. Reinforcement Learning using Kernel-Based Stochastic Factorization , 2011, NIPS.

[146] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[147] Dieter Fox,et al. Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..

[148] Marc Toussaint,et al. Path Integral Control by Reproducing Kernel Hilbert Space Embedding , 2013, IJCAI.

[149] Giorgio Metta,et al. Grounding vision through experimental manipulation , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[150] Thomas B. Schön,et al. Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models , 2015, ArXiv.

[151] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[152] N. Roy,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .

[153] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[154] C. Mallows,et al. A Method for Comparing Two Hierarchical Clusterings , 1983 .

[155] Vicenç Gómez,et al. Dynamic Policy Programming with Function Approximation , 2011, AISTATS.

[156] Oliver Brock,et al. Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[157] Dieter Fox,et al. RGB-D object discovery via multi-scene analysis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[158] Jivko Sinapov,et al. Toward interactive learning of object categories by a robot: A case study with container and non-container objects , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[159] Gaurav S. Sukhatme,et al. Interactive affordance map building for a robotic task , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[160] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[161] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[162] Siddhartha S. Srinivasa,et al. Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[163] Le Song,et al. A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[164] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[165] Connor Schenck,et al. Interactive object recognition using proprioceptive and auditory feedback , 2011, Int. J. Robotics Res..

[166] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[167] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[168] Stefan Schaal,et al. Hierarchical reinforcement learning with movement primitives , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[169] Marc Toussaint,et al. Active exploration of joint dependency structures , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[170] Friedrich T. Sommer,et al. Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[171] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[172] Oliver Kroemer,et al. Probabilistic interactive segmentation for anthropomorphic robots in cluttered environments , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[173] Leslie Pack Kaelbling,et al. Bayesian Policy Search with Policy Priors , 2011, IJCAI.

[174] Shariq A. Mobin,et al. Information-based learning by agents in unbounded state spaces , 2014, NIPS.

[175] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[176] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[177] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[178] Shie Mannor,et al. Biases and Variance in Value Function Estimates , 2004 .

[179] Benjamin Kuipers,et al. The initial development of object knowledge by a learning robot , 2008, Robotics Auton. Syst..

[180] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.

[181] Marc Toussaint,et al. Learned graphical models for probabilistic planning provide a new class of movement primitives , 2013, Front. Comput. Neurosci..

[182] Yang Liu,et al. A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[183] James M. Rehg,et al. Guided pushing for object singulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[184] Jan Peters,et al. Active tactile object exploration with Gaussian processes , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[185] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[186] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[187] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[188] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[189] Wei Sun,et al. Autoscanning for coupled scene reconstruction and proactive object analysis , 2015, ACM Trans. Graph..

[190] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.

[191] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[192] Justus H. Piater,et al. A Probabilistic Framework for 3D Visual Object Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[193] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[194] Oliver Brock,et al. Manipulating articulated objects with interactive perception , 2008, 2008 IEEE International Conference on Robotics and Automation.

[195] Matthew W. Hoffman,et al. Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[196] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[197] Guy Lever,et al. Conditional mean embeddings as regressors , 2012, ICML.

[198] Haibo He,et al. Kernel-Based Approximate Dynamic Programming for Real-Time Online Learning Control: An Experimental Study , 2014, IEEE Transactions on Control Systems Technology.

[199] Shrinivas J. Pundlik,et al. Real-Time Motion Segmentation of Sparse Feature Points at Any Speed , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[200] Gunnar Rätsch,et al. Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[201] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[202] Gaurav S. Sukhatme,et al. Active articulation model estimation through interactive perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[203] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[204] Oliver Kroemer,et al. Learning to predict phases of manipulation tasks as hidden states , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[205] Peter Stone,et al. Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[206] Oliver Kroemer,et al. Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[207] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.