Learning the Arrow of Time

We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we address the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture meaningful information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. We show empirical results on a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a known notion of an arrow of time given by the celebrated Jordan-Kinderlehrer-Otto result.

[1]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[2]  U. Seifert Stochastic thermodynamics, fluctuation theorems and molecular machines , 2012, Reports on progress in physics. Physical Society.

[3]  Sergey Levine,et al.  Time Reversal as Self-Supervision , 2018, ArXiv.

[4]  WILLIAM THOMSON,et al.  Kinetic Theory of the Dissipation of Energy , 1874, Nature.

[5]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[6]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[7]  B. Schoelkopf,et al.  Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.

[8]  Andrew Zisserman,et al.  Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Bernhard Schölkopf,et al.  Seeing the Arrow of Time , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[13]  Laurent Orseau,et al.  Measuring and avoiding side effects using relative reachability , 2018, ArXiv.

[14]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[16]  Sergey Levine,et al.  Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.

[17]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[18]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[19]  Zurek,et al.  Algorithmic randomness and physical entropy. , 1989, Physical review. A, General physics.

[20]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[21]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[22]  Wojciech H. Zurek Decoherence, chaos, quantum-classical correspondence, and the algorithmic arrow of time , 1998 .

[23]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[24]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[25]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[26]  Stefan Bauer,et al.  The Arrow of Time in Multivariate Time Series , 2016, ICML.

[27]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[29]  Weihai Zhang,et al.  Stability of Nonlinear Stochastic Discrete-Time Systems , 2013, J. Appl. Math..

[30]  A. Eddington,et al.  The nature of the physical world, by A.S. Eddington ... , 1928 .

[31]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[32]  J. Willems Dissipative dynamical systems part I: General theory , 1972 .

[33]  Kirk T. McDonald,et al.  A Damped Oscillator as a Hamiltonian System , 2015 .

[34]  G. Crooks Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[35]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[36]  Stuart Armstrong,et al.  Low Impact Artificial Intelligences , 2017, ArXiv.

[37]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[38]  Dominik Janzing,et al.  On the Entropy Production of Time Series with Unidirectional Linearity , 2009, 0908.1861.