Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings. Our new framework correctly models the joint uncertainty in the latent parameters and the state space. We also replace the original Gaussian Process-based model with a Bayesian Neural Network, enabling more scalable inference. Thus, we expand the scope of the HiP-MDP to applications with higher dimensions and more complex dynamics.

[1]  Benjamin Rosman,et al.  Bayesian policy reuse , 2015, Machine Learning.

[2]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[3]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[4]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[5]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[6]  David Hsu,et al.  Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[9]  Lihong Li,et al.  Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[10]  Larry D. Pyeatt,et al.  Reinforcement learning for closed-loop propofol anesthesia: a study in human volunteers , 2014, J. Mach. Learn. Res..

[11]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[12]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[13]  Alan Fern,et al.  A Computational Decision Theory for Interactive Assistants , 2010, Interactive Decision Theory and Game Theory.

[14]  Joaquin Quiñonero-Candela,et al.  Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines , 2004 .

[15]  Akane Sano,et al.  Multi-task , Multi-Kernel Learning for Estimating Individual Wellbeing , 2015 .

[16]  Steve Young,et al.  Scaling POMDPs for dialog management with composite summary point-based value iteration (CSPBVI) , 2006 .

[17]  Darwin G. Caldwell,et al.  Transfer learning of shared latent spaces between robots with similar kinematic structure , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[18]  C. R. Dietrich,et al.  Fast and Exact Simulation of Stationary Gaussian Processes through Circulant Embedding of the Covariance Matrix , 1997, SIAM J. Sci. Comput..

[19]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qiang Yang,et al.  Adaptive Transfer Learning , 2010, AAAI.

[21]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[24]  Marc G. Genton,et al.  Cross-Covariance Functions for Multivariate Geostatistics , 2015, 1507.08017.

[25]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[26]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[27]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[28]  Min Chen,et al.  POMDP-lite for robust robot planning under uncertainty , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[32]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[33]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[34]  Marc Peter Deisenroth,et al.  Expectation Propagation in Gaussian Process Dynamical Systems , 2012, NIPS.

[35]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[36]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[37]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[38]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[39]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[40]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[41]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[42]  Catholijn M. Jonker,et al.  Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning , 2017, ArXiv.

[43]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[44]  Suchi Saria,et al.  Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses , 2016, J. Mach. Learn. Res..

[45]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[46]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[47]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[48]  Samantha Kleinberg,et al.  Causal Explanation Under Indeterminism: A Sampling Approach , 2016, AAAI.

[49]  Michael L. Littman,et al.  Quantifying Uncertainty in Batch Personalized Sequential Decision Making , 2014, AAAI Workshop: Modern Artificial Intelligence for Health Analytics.

[50]  B. Adams,et al.  Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.

[51]  José Miguel Hernández-Lobato,et al.  Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables , 2017, 1706.08495.

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[54]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.