Transfer-Entropy-Regularized Markov Decision Processes

We consider the framework of transfer-entropy-regularized Markov Decision Process (TERMDP) in which the weighted sum of the classical state-dependent cost and the transfer entropy from the state random process to the control random process is minimized. Although TERMDPs are generally formulated as nonconvex optimization problems, we derive an analytical necessary optimality condition expressed as a finite set of nonlinear equations, based on which an iterative forward-backward computational procedure similar to the Arimoto-Blahut algorithm is proposed. It is shown that every limit point of the sequence generated by the proposed algorithm is a stationary point of the TERMDP. Applications of TERMDPs are discussed in the context of networked control systems theory and non-equilibrium thermodynamics. The proposed algorithm is applied to an information-constrained maze navigation problem, whereby we study how the price of information qualitatively alters the optimal decision polices.

[1]  Emanuel Todorov,et al.  A Unifying Framework for Linearly Solvable Control , 2011, UAI.

[2]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[3]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[4]  Joseph T. Lizier,et al.  Directed Information Measures in Neuroscience , 2014 .

[5]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Serdar Yüksel,et al.  Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems , 2012, IEEE Transactions on Information Theory.

[8]  Charalambos D. Charalambous,et al.  Zero-Delay Rate Distortion via Filtering for Unstable Vector Gaussian Sources , 2017, ArXiv.

[9]  Pablo A. Parrilo,et al.  Semidefinite Programming Approach to Gaussian Sequential Rate-Distortion Trade-Offs , 2014, IEEE Transactions on Automatic Control.

[10]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[11]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[12]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[13]  Evangelos Theodorou,et al.  Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations , 2015, Entropy.

[14]  David L. Neuhoff,et al.  Causal source codes , 1982, IEEE Trans. Inf. Theory.

[15]  Tsachy Weissman,et al.  On competitive prediction and its relation to rate-distortion theory , 2003, IEEE Trans. Inf. Theory.

[16]  Ramji Venkataramanan,et al.  Source Coding With Feed-Forward: Rate-Distortion Theorems and Error Exponents for a General Source , 2007, IEEE Transactions on Information Theory.

[17]  Gerhard Kramer Capacity results for the discrete memoryless network , 2003, IEEE Trans. Inf. Theory.

[18]  C. Gouriéroux,et al.  Kullback Causality Measures , 1987 .

[19]  Takashi Tanaka,et al.  LQG Control With Minimum Directed Information: Semidefinite Programming Approach , 2015, IEEE Transactions on Automatic Control.

[20]  H. Marko,et al.  The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[21]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Haim H. Permuter,et al.  Universal Estimation of Directed Information , 2010, IEEE Transactions on Information Theory.

[23]  Nicola Elia,et al.  When bode meets shannon: control-oriented feedback communication schemes , 2004, IEEE Transactions on Automatic Control.

[24]  Milan S. Derpich,et al.  A Framework for Control System Design Subject to Average Data-Rate Constraints , 2011, IEEE Transactions on Automatic Control.

[25]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[26]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[27]  Charalambos D. Charalambous,et al.  Optimization of directed information and relations to filtering theory , 2014, 2014 European Control Conference (ECC).

[28]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[29]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[30]  Milan S. Derpich,et al.  Improved Upper Bounds to the Causal Quadratic Rate-Distortion Function for Gaussian Stationary Sources , 2012, IEEE Transactions on Information Theory.

[31]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[32]  Charles H. Bennett,et al.  The thermodynamics of computation—a review , 1982 .

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Naftali Tishby,et al.  Past-future information bottleneck in dynamical systems. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[36]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[37]  Aram Galstyan,et al.  Information transfer in social media , 2011, WWW.

[38]  Sosuke Ito,et al.  Information thermodynamics on causal networks. , 2013, Physical review letters.

[39]  Milan S. Derpich,et al.  A Characterization of the Minimal Average Data Rate That Guarantees a Given Closed-Loop Performance Level , 2014, IEEE Transactions on Automatic Control.

[40]  Henrik Sandberg,et al.  Second-law-like inequalities with information and their interpretations , 2014, 1409.5351.

[41]  Naftali Tishby,et al.  Predictability, Complexity, and Learning , 2000, Neural Computation.

[42]  Haim H. Permuter,et al.  Extension of the Blahut-Arimoto algorithm for maximizing directed information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[43]  Olivier J. J. Michel,et al.  On directed information theory and Granger causality graphs , 2010, Journal of Computational Neuroscience.

[44]  Gerald Matz,et al.  Information geometric formulation and interpretation of accelerated Blahut-Arimoto-type algorithms , 2004, Information Theory Workshop.

[45]  Sean P. Meyn,et al.  Rationally Inattentive Control of Markov Processes , 2015, SIAM J. Control. Optim..

[46]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[47]  Tobias J. Oechtering,et al.  Stabilization of Linear Systems Over Gaussian Networks , 2013, IEEE Transactions on Automatic Control.

[48]  Imre Csiszár,et al.  On the computation of rate-distortion functions (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[49]  Karl Henrik Johansson,et al.  Rate of prefix-free codes in LQG control systems , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[50]  M. N. Bera,et al.  Thermodynamics from Information , 2018, 1805.10282.

[51]  Sekhar Tatikonda,et al.  Stochastic linear control over a communication channel , 2004, IEEE Transactions on Automatic Control.

[52]  Photios Stavrou,et al.  Information Nonanticipative Rate Distortion Function and Its Applications , 2014, ArXiv.

[53]  C. Sims Implications of rational inattention , 2003 .