暂无分享,去创建一个
[1] Andrea Bonarini,et al. Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.
[2] Rajesh P. N. Rao,et al. Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty , 2013, PloS one.
[3] Jeffrey K. Uhlmann,et al. Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.
[4] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[5] Satinder Singh,et al. Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization , 2014, Top. Cogn. Sci..
[6] D. Wolpert,et al. Cognitive Tomography Reveals Complex, Task-Independent Mental Representations , 2013, Current Biology.
[7] Alexander J. Smola,et al. Meta-Q-Learning , 2020, ICLR.
[8] Paul Schrater,et al. Inverse POMDP: Inferring What You Think from What You Do , 2018, ArXiv.
[9] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[10] Karl J. Friston,et al. Observing the Observer (II): Deciding When to Decide , 2010, PloS one.
[11] Wolfram Burgard,et al. Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics , 2016, AISTATS.
[12] S. Levine,et al. ADAIL: Adaptive Adversarial Imitation Learning , 2020 .
[13] S. Shankar Sastry,et al. Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning , 2020, ICLR.
[14] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[15] Xaq Pitkow,et al. Tracking the Mind’s Eye: Primate Gaze Behavior during Virtual Visuomotor Navigation Reflects Belief Dynamics , 2020, Neuron.
[16] Zhengwei Wu,et al. Rational thoughts in neural codes , 2020, Proceedings of the National Academy of Sciences.
[17] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[19] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.
[20] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[21] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.
[22] Thomas L. Griffiths,et al. Cognitive Model Priors for Predicting Human Decisions , 2019, ICML.
[23] Joshua B. Tenenbaum,et al. Bayesian Theory of Mind: Modeling Joint Belief-Desire Attribution , 2011, CogSci.
[24] Michael L. Littman,et al. Apprenticeship Learning About Multiple Intentions , 2011, ICML.
[25] Constantin A. Rothkopf,et al. I See What You See: Inferring Sensor and Policy Models of Human Real-World Motor Behavior , 2017, AAAI.
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.
[28] Rajesh P. N. Rao,et al. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..
[29] Jonathan Tompson,et al. ADAIL: Adaptive Adversarial Imitation Learning , 2020, ArXiv.
[30] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[31] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[32] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[33] Gašper Tkačik,et al. Inferring the function performed by a recurrent neural network , 2019, PloS one.
[34] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.
[35] Benjamin Beyret,et al. The Animal-AI Olympics , 2019, Nature Machine Intelligence.
[36] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.
[37] Karl J. Friston,et al. Observing the Observer (I): Meta-Bayesian Models of Learning and Decision-Making , 2010, PloS one.
[38] Kee-Eung Kim,et al. A Bayesian Approach to Generative Adversarial Imitation Learning , 2018, NeurIPS.
[39] Rajesh P. N. Rao,et al. Bayesian brain : probabilistic approaches to neural coding , 2006 .
[40] Chris L. Baker,et al. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing , 2017, Nature Human Behaviour.
[41] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[42] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.
[43] Emanuel Todorov,et al. Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.
[44] Geoffrey E. Hinton,et al. Massively Parallel Architectures for AI: NETL, Thistle, and Boltzmann Machines , 1983, AAAI.
[45] Jo van Nunen,et al. A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..
[46] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[47] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[48] Maneesh Sahani,et al. Flexible and accurate inference and learning for deep generative models , 2018, NeurIPS.
[49] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[50] Falk Lieder,et al. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources , 2019, Behavioral and Brain Sciences.
[51] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .
[52] Gregory C. DeAngelis,et al. A Dynamic Bayesian Observer Model Reveals Origins of Bias in Visual Path Integration , 2017, Neuron.
[53] Xaq Pitkow,et al. A Dynamic Bayesian Observer Model Reveals Origins of Bias in Visual Path Integration , 2017 .
[54] Monica C. Vroman. MAXIMUM LIKELIHOOD INVERSE REINFORCEMENT LEARNING , 2014 .
[55] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[56] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.
[57] Sanjit A. Seshia,et al. Learning Task Specifications from Demonstrations , 2017, NeurIPS.
[58] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[59] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[60] Anca D. Dragan,et al. Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior , 2018, NeurIPS.
[61] A. Pouget,et al. Not Noisy, Just Wrong: The Role of Suboptimal Inference in Behavioral Variability , 2012, Neuron.
[62] Richard L. Lewis,et al. Rational adaptation under task and processing constraints: implications for testing theories of cognition and action. , 2009, Psychological review.
[63] R. Bellman. A Markovian Decision Process , 1957 .
[64] H. Sebastian Seung,et al. Q-Learning for Continuous Actions with Cross-Entropy Guided Policies , 2019, ArXiv.
[65] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[66] Tanmay Gangwani,et al. State-only Imitation with Transition Dynamics Mismatch , 2020, ICLR.
[67] Thomas L. Griffiths,et al. Inferring Learners' Knowledge From Their Actions , 2015, Cogn. Sci..
[68] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[69] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[70] Yee Whye Teh,et al. Meta reinforcement learning as task inference , 2019, ArXiv.