Optimal Policies for Quantum Markov Decision Processes

Markov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

[1]  Mingsheng Ying,et al.  Quantum computation, quantum theory and AI , 2010, Artif. Intell..

[2]  Yuan Feng,et al.  Model Checking Quantum Systems , 2021 .

[3]  Hans-J. Briegel,et al.  Quantum-enhanced machine learning , 2016, Physical review letters.

[4]  Mingsheng Ying,et al.  Foundations of Quantum Programming , 2016 .

[5]  Doina Precup,et al.  Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[6]  Lu-Ming Duan,et al.  Machine learning meets quantum physics , 2019, Physics Today.

[7]  Hans-J. Briegel,et al.  Advances in quantum reinforcement learning , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[8]  Yuan Feng,et al.  Verification of Quantum Programs , 2011, Sci. Comput. Program..

[9]  Jennifer L. Barry,et al.  Quantum partially observable Markov decision processes , 2014 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[11]  Yuan Feng,et al.  Reachability Probabilities of Quantum Markov Chains , 2013, CONCUR.

[12]  Andris Ambainis,et al.  One-dimensional quantum walks , 2001, STOC '01.

[13]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14]  Zonghai Chen,et al.  Quantum Reinforcement Learning , 2005, ICNC.

[15]  Tzyh Jong Tarn,et al.  Quantum Reinforcement Learning , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Mingsheng Ying,et al.  Reachability Analysis of Quantum Markov Decision Processes , 2014, Inf. Comput..

[20]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[21]  Yuan Feng,et al.  Decomposition of quantum Markov chains and its applications , 2016, J. Comput. Syst. Sci..

[22]  Hans-J. Briegel,et al.  Machine learning \& artificial intelligence in the quantum domain , 2017, ArXiv.

[23]  Jacob biamonte,et al.  Quantum machine learning , 2016, Nature.

[24]  Paul Benioff QUANTUM ROBOTS AND ENVIRONMENTS , 1998 .

[25]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[26]  P. Benioff,et al.  Some foundational aspects of quantum computers and quantum robots. , 1998 .

[27]  Zonghai Chen,et al.  Quantum robot: structure, algorithms and applications , 2005, Robotica.