论文信息 - Optimal Policies for Quantum Markov Decision Processes

Optimal Policies for Quantum Markov Decision Processes

Markov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

[1] Mingsheng Ying,et al. Quantum computation, quantum theory and AI , 2010, Artif. Intell..

[2] Yuan Feng,et al. Model Checking Quantum Systems , 2021 .

[3] Hans-J. Briegel,et al. Quantum-enhanced machine learning , 2016, Physical review letters.

[4] Mingsheng Ying,et al. Foundations of Quantum Programming , 2016 .

[5] Doina Precup,et al. Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[6] Lu-Ming Duan,et al. Machine learning meets quantum physics , 2019, Physics Today.

[7] Hans-J. Briegel,et al. Advances in quantum reinforcement learning , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[8] Yuan Feng,et al. Verification of Quantum Programs , 2011, Sci. Comput. Program..

[9] Jennifer L. Barry,et al. Quantum partially observable Markov decision processes , 2014 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[11] Yuan Feng,et al. Reachability Probabilities of Quantum Markov Chains , 2013, CONCUR.

[12] Andris Ambainis,et al. One-dimensional quantum walks , 2001, STOC '01.

[13] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14] Zonghai Chen,et al. Quantum Reinforcement Learning , 2005, ICNC.

[15] Tzyh Jong Tarn,et al. Quantum Reinforcement Learning , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19] Mingsheng Ying,et al. Reachability Analysis of Quantum Markov Decision Processes , 2014, Inf. Comput..

[20] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[21] Yuan Feng,et al. Decomposition of quantum Markov chains and its applications , 2016, J. Comput. Syst. Sci..

[22] Hans-J. Briegel,et al. Machine learning \& artificial intelligence in the quantum domain , 2017, ArXiv.

[23] Jacob biamonte,et al. Quantum machine learning , 2016, Nature.

[24] Paul Benioff. QUANTUM ROBOTS AND ENVIRONMENTS , 1998 .

[25] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[26] P. Benioff,et al. Some foundational aspects of quantum computers and quantum robots. , 1998 .

[27] Zonghai Chen,et al. Quantum robot: structure, algorithms and applications , 2005, Robotica.