Learning the Arrow of Time for Problems in Reinforcement Learning
暂无分享,去创建一个
Yoshua Bengio | Anirudh Goyal | Steffen Wolf | Nasim Rahaman | Roman Remme | Yoshua Bengio | Anirudh Goyal | Nasim Rahaman | Steffen Wolf | Roman Remme
[1] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[2] D. Kinderlehrer,et al. THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .
[3] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[4] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[5] Bernhard Schölkopf,et al. Detecting the direction of causal time series , 2009, ICML '09.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Aram Galstyan,et al. Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.
[8] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[9] U. Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines , 2012, Reports on progress in physics. Physical Society.
[10] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[11] Bernhard Schölkopf,et al. Seeing the Arrow of Time , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Laurent Orseau,et al. Measuring and avoiding side effects using relative reachability , 2018, ArXiv.
[13] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[14] A. Eddington,et al. The nature of the physical world, by A.S. Eddington ... , 1928 .
[15] B. Schoelkopf,et al. Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.
[16] Andrew Zisserman,et al. Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] A. Kraskov,et al. Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.
[18] Stefan Bauer,et al. The Arrow of Time in Multivariate Time Series , 2016, ICML.
[19] A. Savitzky,et al. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .
[20] Yoshua Bengio,et al. Unsupervised State Representation Learning in Atari , 2019, NeurIPS.
[21] J. Willems. Dissipative dynamical systems part I: General theory , 1972 .
[22] Ankush Gupta,et al. Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.
[23] Zurek,et al. Algorithmic randomness and physical entropy. , 1989, Physical review. A, General physics.
[24] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[25] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .
[26] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[27] Dominik Janzing,et al. On the Entropy Production of Time Series with Unidirectional Linearity , 2009, 0908.1861.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.
[30] Stuart Armstrong,et al. Low Impact Artificial Intelligences , 2017, ArXiv.
[31] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[32] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[33] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.