Measuring Structural Similarities in Finite MDPs

In this paper, we investigate the structural similarities within a finite Markov decision process (MDP). We view a finite MDP as a heterogeneous directed bipartite graph and propose novel measures for the state and action similarities, in a mutually reinforced manner. We prove that the state similarity is a metric and the action similarity is a pseudometric. We also establish the connection between the proposed similarity measures and the optimal values of the MDP. Extensive experiments show that the proposed measures are effective.

[1]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[2]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[3]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[4]  Eric Eaton,et al.  An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[5]  Stefanie Tellex,et al.  Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[6]  Gabriel Alejandro,et al.  Statistical distances and probability metrics for multivariate data, ensembles and probability distributions , 2015 .

[7]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[8]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[10]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[11]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[12]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[13]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[14]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[15]  Peter Stone,et al.  An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[16]  M. C. Delfour,et al.  Shapes and Geometries - Metrics, Analysis, Differential Calculus, and Optimization, Second Edition , 2011, Advances in design and control.

[17]  Michael R. Lyu,et al.  MatchSim: a novel similarity measure based on maximum neighborhood matching , 2012, Knowledge and Information Systems.

[18]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Guihai Chen,et al.  Dynamic virtual machine management via approximate Markov decision process , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[21]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[22]  A. Barto,et al.  Homomorphisms : An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003 .

[23]  Kamiel Cornelissen,et al.  Smoothed Analysis of the Successive Shortest Path Algorithm , 2013, SIAM J. Comput..

[24]  Lawson L. S. Wong,et al.  State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[25]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[26]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[27]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[28]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[30]  Ruoming Jin,et al.  Scalable and axiomatic ranking of network role similarity , 2014, ACM Trans. Knowl. Discov. Data.

[31]  Yang Gao,et al.  Measuring the Distance Between Finite Markov Decision Processes , 2016, AAMAS.