论文信息 - Measuring Structural Similarities in Finite MDPs

Measuring Structural Similarities in Finite MDPs

In this paper, we investigate the structural similarities within a finite Markov decision process (MDP). We view a finite MDP as a heterogeneous directed bipartite graph and propose novel measures for the state and action similarities, in a mutually reinforced manner. We prove that the state similarity is a metric and the action similarity is a pseudometric. We also establish the connection between the proposed similarity measures and the optimal values of the MDP. Extensive experiments show that the proposed measures are effective.

[1] Ioannis Antonellis,et al. Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[2] L. Goddard,et al. Operations Research (OR) , 2007 .

[3] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.

[4] Eric Eaton,et al. An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[5] Stefanie Tellex,et al. Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[6] Gabriel Alejandro,et al. Statistical distances and probability metrics for multivariate data, ensembles and probability distributions , 2015 .

[7] Satinder P. Singh,et al. Transfer via soft homomorphisms , 2009, AAMAS.

[8] Leonidas J. Guibas,et al. A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9] Andrea Bonarini,et al. Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[10] J. Meigs,et al. WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[11] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[12] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[13] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[14] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[15] Peter Stone,et al. An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[16] M. C. Delfour,et al. Shapes and Geometries - Metrics, Analysis, Differential Calculus, and Optimization, Second Edition , 2011, Advances in design and control.

[17] Michael R. Lyu,et al. MatchSim: a novel similarity measure based on maximum neighborhood matching , 2012, Knowledge and Information Systems.

[18] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Guihai Chen,et al. Dynamic virtual machine management via approximate Markov decision process , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[21] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[22] A. Barto,et al. Homomorphisms : An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003 .

[23] Kamiel Cornelissen,et al. Smoothed Analysis of the Successive Shortest Path Algorithm , 2013, SIAM J. Comput..

[24] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[25] Edward A. Fox,et al. SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[26] Jun Wang,et al. Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[27] Peter Stone,et al. Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[28] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29] Yizhou Sun,et al. P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[30] Ruoming Jin,et al. Scalable and axiomatic ranking of network role similarity , 2014, ACM Trans. Knowl. Discov. Data.

[31] Yang Gao,et al. Measuring the Distance Between Finite Markov Decision Processes , 2016, AAMAS.