Stochastic gradient descent for hybrid quantum-classical optimization

Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of stochastic gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with $k$ measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of $k$. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases $k$-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain ``doubly stochastic'' gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

[1]  D. S. Tracy,et al.  Generalized $h$-Statistics and Other Symmetric Functions , 1974 .

[2]  Alan J. Lee,et al.  U-Statistics: Theory and Practice , 1990 .

[3]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[4]  Houshang H. Sohrab Basic real analysis , 2003 .

[5]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[7]  Norbert Schuch,et al.  Entropy scaling and simulability by matrix product states. , 2007, Physical review letters.

[8]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[9]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[11]  Scott Aaronson,et al.  The computational complexity of linear optics , 2010, STOC '11.

[12]  B. Sanders,et al.  Quantum-circuit design for efficient simulations of many-body quantum dynamics , 2011, 1108.4318.

[13]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[14]  Alán Aspuru-Guzik,et al.  A variational eigenvalue solver on a photonic quantum processor , 2013, Nature Communications.

[15]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[16]  E. Farhi,et al.  A Quantum Approximate Optimization Algorithm , 2014, 1411.4028.

[17]  Ryan Babbush,et al.  The theory of variational hybrid quantum-classical algorithms , 2015, 1509.04279.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[21]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[22]  Chun-Liang Li,et al.  Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods , 2016, UAI.

[23]  Ashley Montanaro,et al.  Average-case complexity versus approximate simulation of commuting quantum computations , 2015, Physical review letters.

[24]  L. Duan,et al.  Quantum Supremacy for Simulating a Translation-Invariant Ising Spin Model. , 2016, Physical review letters.

[25]  J. Eisert,et al.  Architectures for quantum simulation showing a quantum speedup , 2017, 1703.00466.

[26]  J. Gambetta,et al.  Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets , 2017, Nature.

[27]  C. Monroe,et al.  Observation of a many-body dynamical phase transition with a 53-qubit quantum simulator , 2017, Nature.

[28]  M. Lukin,et al.  Probing many-body dynamics on a 51-atom quantum simulator , 2017, Nature.

[29]  Michael Broughton,et al.  A Universal Training Algorithm for Quantum Deep Learning , 2018, 1806.09729.

[30]  Leo Zhou,et al.  Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices , 2018, Physical Review X.

[31]  Mark Hoogendoorn,et al.  Mathematical Foundations for Supervised Learning , 2018 .

[32]  Nathan Killoran,et al.  PennyLane: Automatic differentiation of hybrid quantum-classical computations , 2018, ArXiv.

[33]  Keisuke Fujii,et al.  Quantum circuit learning , 2018, Physical Review A.

[34]  Yuanzhi Li,et al.  An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.

[35]  Hartmut Neven,et al.  Classification with Quantum Neural Networks on Near Term Processors , 2018, 1802.06002.

[36]  H Neven,et al.  A blueprint for demonstrating quantum supremacy with superconducting qubits , 2017, Science.

[37]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[38]  Maria Schuld,et al.  Supervised Learning with Quantum Computers , 2018 .

[39]  Jie Chen,et al.  Stochastic Gradient Descent with Biased but Consistent Gradient Estimators , 2018, ArXiv.

[40]  John Preskill,et al.  Quantum Computing in the NISQ era and beyond , 2018, Quantum.

[41]  D Zhu,et al.  Training of quantum circuits on a hybrid quantum computer , 2018, Science Advances.

[42]  John C. Platt,et al.  Quantum supremacy using a programmable superconducting processor , 2019, Nature.

[43]  Srinivasan Arunachalam,et al.  Optimizing quantum optimization algorithms via faster quantum gradient computation , 2017, SODA.

[44]  Kristan Temme,et al.  Supervised learning with quantum-enhanced feature spaces , 2018, Nature.

[45]  E. Campbell Random Compiler for Fast Hamiltonian Simulation. , 2018, Physical review letters.

[46]  Marcello Benedetti,et al.  Parameterized quantum circuits as machine learning models , 2019, Quantum Science and Technology.

[47]  Marten van Dijk,et al.  Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD , 2018, NeurIPS.

[48]  P. Zoller,et al.  Self-verifying variational quantum simulation of lattice models , 2018, Nature.

[49]  Margaret Martonosi,et al.  Minimizing State Preparations in Variational Quantum Eigensolver by Partitioning into Commuting Families , 2019, 1907.13623.

[50]  M. Girvin,et al.  Quantum Simulation of Gauge Theories and Inflation , 2019, Journal Club for Condensed Matter Physics.

[51]  Francesco Orabona,et al.  On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.

[52]  C. Gogolin,et al.  Evaluating analytic gradients on quantum hardware , 2018, Physical Review A.

[53]  M. Schuld,et al.  Circuit-centric quantum classifiers , 2018, Physical Review A.

[54]  L. Banchi,et al.  Noise-resilient variational hybrid quantum-classical optimization , 2019, Physical Review A.

[55]  Patrick J. Coles,et al.  An Adaptive Optimizer for Measurement-Frugal Variational Algorithms , 2019, Quantum.

[56]  John Napp,et al.  Low-Depth Gradient Measurements Can Improve Convergence in Variational Hybrid Quantum-Classical Algorithms. , 2019, Physical review letters.

[57]  Alejandro Perdomo-Ortiz,et al.  Robust implementation of generative modeling with parametrized quantum circuits , 2019, Quantum Machine Intelligence.