A Distributional Perspective on Reinforcement Learning

爱吃猫的鱼0At March 15, 2022, 8:05 p.m.
[1] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .
[2] D. Freedman,et al. Some Asymptotic Theory for the Bootstrap , 1981 .
[3] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[4] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[5] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[8] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[9] John N. Tsitsiklis,et al. NeuroDynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[11] Stuart J. Russell,et al. Bayesian QLearning , 1998, AAAI/IAAI.
[12] Uwe Rr Osler. A Fixed Point Theorem for Distributions , 1999 .
[13] Paul E. Utgoff,et al. ManyLayered Learning , 2002, Neural Computation.
[14] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[15] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[18] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[19] Michael Bowling,et al. Dual Representations for Dynamic Programming , 2008 .
[20] Nando de Freitas,et al. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward , 2009, AISTATS.
[21] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[22] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[23] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[24] Patrick M. Pilarski,et al. Horde: a scalable realtime architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[25] P. Schrimpf,et al. Dynamic Programming , 2011 .
[26] John N. Tsitsiklis,et al. MeanVariance Optimization in Markov Decision Processes , 2011, ICML.
[27] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[28] J. Norris. Appendix: probability and measure , 1997 .
[29] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[30] Mohammad Ghavamzadeh,et al. ActorCritic Algorithms for RiskSensitive MDPs , 2013, NIPS.
[31] Marc G. Bellemare,et al. Compress and Control , 2015, AAAI.
[32] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[33] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[34] Shane Legg,et al. Humanlevel control through deep reinforcement learning , 2015, Nature.
[35] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[36] Marc G. Bellemare,et al. Q(λ) with OffPolicy Corrections , 2016, ALT.
[37] Shie Mannor,et al. Learning the Variance of the RewardToGo , 2016, J. Mach. Learn. Res..
[38] David Silver,et al. Deep Reinforcement Learning with Double QLearning , 2015, AAAI.
[39] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[40] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[41] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[42] Marc G. Bellemare,et al. The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.
[43] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.