A Taxonomy of Similarity Metrics for Markov Decision Processes

Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making Reinforcement Learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve the (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance has been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metrics.

[1]  Ruoming Jin,et al.  Scalable and axiomatic ranking of network role similarity , 2014, ACM Trans. Knowl. Discov. Data.

[2]  Eliseo Ferrante,et al.  Transfer of task representation in reinforcement learning using policy-based proto-value functions , 2008, AAMAS.

[3]  Santiago Ontan'on,et al.  An overview of distance and similarity functions for structured data , 2020, Artificial Intelligence Review.

[4]  Barteld Kooi,et al.  Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems , 2011, Adaptive Agents and Multi-Agent Systems.

[5]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[6]  Pablo Samuel Castro,et al.  Scalable methods for computing state similarity in deterministic Markov Decision Processes , 2019, AAAI.

[7]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[8]  Jaime Simão Sichman,et al.  Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2 , 2009, AAMAS 2009.

[9]  R. Lathe Phd by thesis , 1988, Nature.

[10]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[11]  J.L. Carroll,et al.  Task similarity measures for transfer in reinforcement learning task libraries , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[12]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[13]  Alessandro Lazaric,et al.  Regret Bounds for Reinforcement Learning with Policy Advice , 2013, ECML/PKDD.

[14]  Yi Zhou,et al.  Latent Structure Matching for Knowledge Transfer in Reinforcement Learning , 2020, Future Internet.

[15]  Habib Hamam,et al.  Artificial Intelligence Review , 2019, Advanced Methodologies and Technologies in Artificial Intelligence, Computer Simulation, and Human-Computer Interaction.

[16]  Mimi,et al.  Taylor, , 2020, Catalysis from A to Z.

[17]  B. Hodson,et al.  The effect of passage in vitro and in vivo on the properties of murine fibrosarcomas. II. Sensitivity to cell-mediated cytotoxicity in vitro. , 1985, British Journal of Cancer.

[18]  Servicio Geológico Colombiano Sgc Volume 4 , 2013, Journal of Diabetes Investigation.

[19]  Ling Shao,et al.  Measuring Structural Similarities in Finite MDPs , 2019, IJCAI.

[20]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[21]  Doina Precup,et al.  Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics , 2011, EWRL.

[22]  Yang Gao,et al.  Measuring the Distance Between Finite Markov Decision Processes , 2016, AAMAS.

[23]  David Kemmerer,et al.  Categories of object concepts across languages and brains: the relevance of nominal classification systems to cognitive neuroscience , 2017 .

[24]  Yoshio Sakka,et al.  IOP Conference Series: Materials Science and Engineering , 2022, IOP Conference Series: Materials Science and Engineering.

[25]  Christian Gagné,et al.  A Principled Approach for Learning Task Similarity in Multitask Learning , 2019, IJCAI.

[26]  Peter Stone,et al.  Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[27]  C. S. Kubrusly,et al.  Distance Between Sets - A survey , 2018, 1808.02574.

[28]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[29]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[30]  Siyuan Li,et al.  An Optimal Online Method of Selecting Source Policies for Reinforcement Learning , 2017, AAAI.

[31]  Matthew E. Taylor,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[32]  D. L. Corgan,et al.  King's College , 1867, British medical journal.

[33]  Ioannis P. Vlahavas,et al.  Transfer learning with probabilistic mapping selection , 2015, Adapt. Behav..

[34]  P. Hofstaetter [Similarity]. , 2020, Psyche.

[35]  Jorge Pena Queralta,et al.  Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[36]  Tze-Yun Leong,et al.  Effects of Task Similarity on Policy Transfer with Selective Exploration in Reinforcement Learning , 2019, AAMAS.

[37]  M. M. Hassan Mahmud,et al.  Clustering Markov Decision Processes For Continual Transfer , 2013, ArXiv.

[38]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[39]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[40]  Michael R. Lyu,et al.  MatchSim: a novel similarity measure based on maximum neighborhood matching , 2012, Knowledge and Information Systems.

[41]  Thommen George Karimpanal,et al.  Self-Organizing Maps as a Storage and Transfer Mechanism in Reinforcement Learning , 2018, ArXiv.

[42]  Alípio Mário Jorge,et al.  Progress in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[43]  Eric Eaton,et al.  An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[44]  Doina Precup,et al.  Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Ricardo Aler,et al.  Knowledge Transfer between Automated Planners , 2011, AI Mag..

[47]  Yusen Zhan,et al.  Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.

[48]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[49]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..