Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control

In this work we introduce the application of black-box quantum control as an interesting rein- forcement learning problem to the machine learning community. We analyze the structure of the reinforcement learning problems arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a general method to solving them. In this context we introduce a variant of the proximal policy optimization (PPO) algorithm called the memory proximal policy optimization (MPPO) which is based on this analysis. We then show how it can be applied to specific learning tasks and present results of nu- merical experiments showing that our method achieves state-of-the-art results for several learning tasks in quantum control with discrete and continouous control parameters.

[1]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[2]  Mario Krenn,et al.  Active learning machine learns to create new quantum experiments , 2017, Proceedings of the National Academy of Sciences.

[3]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4]  J. J. Sakurai,et al.  Modern Quantum Mechanics , 1986 .

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  U. Schollwoeck The density-matrix renormalization group in the age of matrix product states , 2010, 1008.3477.

[7]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[8]  Daniel A. Lidar,et al.  Optimized dynamical decoupling via genetic algorithms , 2013 .

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Barry C. Sanders,et al.  Learning in quantum control: High-dimensional global optimization for noisy quantum dynamics , 2016, Neurocomputing.

[11]  Tommaso Calarco,et al.  Chopped random-basis quantum optimization , 2011, 1103.0855.

[12]  Dieter Suter,et al.  Robust dynamical decoupling for quantum computing and quantum memory. , 2011, Physical review letters.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  C. cohen-tannoudji,et al.  Quantum mechanics volume 1 / Claude Cohen-Tannoudji, Bernard Diu, Franck Laloe , 1977 .

[15]  E. Knill,et al.  DYNAMICAL DECOUPLING OF OPEN QUANTUM SYSTEMS , 1998, quant-ph/9809071.

[16]  Moritz August,et al.  Using Recurrent Neural Networks to Optimize Dynamical Decoupling for Quantum Memory , 2016, ArXiv.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  이동욱 12. M & A , 2000 .

[19]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[20]  J. J. Sakurai,et al.  Modern Quantum Mechanics, Revised Edition , 1995 .

[21]  P. Manju,et al.  Fast machine-learning online optimization of ultra-cold-atom experiments , 2015, Scientific Reports.

[22]  Timo O. Reiss,et al.  Optimal control of coupled spin dynamics: design of NMR pulse sequences by gradient ascent algorithms. , 2005, Journal of magnetic resonance.

[23]  Jacob biamonte,et al.  Quantum machine learning , 2016, Nature.

[24]  Tommaso Calarco,et al.  Optimal control technique for many-body quantum dynamics. , 2010, Physical review letters.

[25]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.