论文信息 - Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

We introduce a new formulation of the Hidden Parameter Markov Decision Process (HiP-MDP), a framework for modeling families of related tasks using low-dimensional latent embeddings. Our new framework correctly models the joint uncertainty in the latent parameters and the state space. We also replace the original Gaussian Process-based model with a Bayesian Neural Network, enabling more scalable inference. Thus, we expand the scope of the HiP-MDP to applications with higher dimensions and more complex dynamics.

[1] Benjamin Rosman,et al. Bayesian policy reuse , 2015, Machine Learning.

[2] David J. Fleet,et al. Gaussian Process Dynamical Models , 2005, NIPS.

[3] Edwin V. Bonilla,et al. Multi-task Gaussian Process Prediction , 2007, NIPS.

[4] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[6] David Hsu,et al. Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[9] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[10] Larry D. Pyeatt,et al. Reinforcement learning for closed-loop propofol anesthesia: a study in human volunteers , 2014, J. Mach. Learn. Res..

[11] Finale Doshi-Velez,et al. Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[12] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[13] Alan Fern,et al. A Computational Decision Theory for Interactive Assistants , 2010, Interactive Decision Theory and Game Theory.

[14] Joaquin Quiñonero-Candela,et al. Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines , 2004 .

[15] Akane Sano,et al. Multi-task , Multi-Kernel Learning for Estimating Individual Wellbeing , 2015 .

[16] Steve Young,et al. Scaling POMDPs for dialog management with composite summary point-based value iteration (CSPBVI) , 2006 .

[17] Darwin G. Caldwell,et al. Transfer learning of shared latent spaces between robots with similar kinematic structure , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[18] C. R. Dietrich,et al. Fast and Exact Simulation of Stationary Gaussian Processes through Circulant Embedding of the Covariance Matrix , 1997, SIAM J. Sci. Comput..

[19] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Qiang Yang,et al. Adaptive Transfer Learning , 2010, AAAI.

[21] Sergey Levine,et al. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Trevor Darrell,et al. Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[24] Marc G. Genton,et al. Cross-Covariance Functions for Multivariate Geostatistics , 2015, 1507.08017.

[25] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[26] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[27] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[28] Min Chen,et al. POMDP-lite for robust robot planning under uncertainty , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[30] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31] Neil D. Lawrence,et al. Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[32] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[33] Richard E. Turner,et al. Black-box α-divergence minimization , 2016, ICML 2016.

[34] Marc Peter Deisenroth,et al. Expectation Propagation in Gaussian Process Dynamical Systems , 2012, NIPS.

[35] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[36] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[37] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[38] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[39] Neil D. Lawrence,et al. Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[40] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[41] Radford M. Neal. Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[42] Catholijn M. Jonker,et al. Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning , 2017, ArXiv.

[43] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[44] Suchi Saria,et al. Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses , 2016, J. Mach. Learn. Res..

[45] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[46] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[47] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[48] Samantha Kleinberg,et al. Causal Explanation Under Indeterminism: A Sampling Approach , 2016, AAAI.

[49] Michael L. Littman,et al. Quantifying Uncertainty in Batch Personalized Sequential Decision Making , 2014, AAAI Workshop: Modern Artificial Intelligence for Health Analytics.

[50] B. Adams,et al. Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.

[51] José Miguel Hernández-Lobato,et al. Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables , 2017, 1706.08495.

[52] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[54] Finale Doshi-Velez,et al. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.