A quantum search decoder for natural language processing

Probabilistic language models, e.g. those based on an LSTM, often face the problem of finding a high probability prediction from a sequence of random variables over a set of tokens. This is commonly addressed using a form of greedy decoding such as beam search, where a limited number of highest-likelihood paths (the beam width) of the decoder are kept, and at the end the maximum-likelihood path is chosen. In this work, we construct a quantum algorithm to find the globally optimal parse (i.e. for infinite beam width) with high constant success probability. When the input to the decoder is distributed as a power-law with exponent $k>0$, our algorithm has runtime $R^{n f(R,k)}$, where $R$ is the alphabet size, $n$ the input length; here $f<1/2$, and $f\rightarrow 0$ exponentially fast with increasing $k$, hence making our algorithm always more than quadratically faster than its classical counterpart. We further modify our procedure to recover a finite beam width variant, which enables an even stronger empirical speedup while still retaining higher accuracy than possible classically. Finally, we apply this quantum beam search decoder to Mozilla's implementation of Baidu's DeepSpeech neural net, which we show to exhibit such a power law word rank frequency.

[1]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[2]  Ji Ma,et al.  Generalized Transition-based Dependency Parsing via Control Parameters , 2016, ACL.

[3]  Gerhard Jäger,et al.  Power Laws and Other heavy-Tailed Distributions in Linguistic Typology , 2012, Adv. Complex Syst..

[4]  Ewin Tang,et al.  A quantum-inspired classical algorithm for recommendation systems , 2018, Electron. Colloquium Comput. Complex..

[5]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[6]  David Vilares,et al.  Transition-based Parsing with Lighter Feed-Forward Networks , 2018, UDW@EMNLP.

[7]  Christoph Dürr,et al.  A Quantum Algorithm for Finding the Minimum , 1996, ArXiv.

[8]  Yann Ponty,et al.  A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution , 2013, Bioinform..

[9]  Leo Egghe,et al.  The Distribution of N-Grams , 2000, Scientometrics.

[10]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[11]  Sampath Kannan,et al.  A Quasi-Polynomial-Time Algorithm for Sampling Words from a Context-Free Language , 1997, Inf. Comput..

[12]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[13]  Ashley Montanaro,et al.  Quantum Pattern Matching Fast on Average , 2014, Algorithmica.

[14]  Yann Ponty,et al.  Non-redundant random generation algorithms for weighted context-free grammars , 2013, Theor. Comput. Sci..

[15]  Matthias Troyer,et al.  Solving the quantum many-body problem with artificial neural networks , 2016, Science.

[16]  Ronald de Wolf,et al.  A Survey of Quantum Property Testing , 2013, Theory Comput..

[17]  Nathan Wiebe,et al.  Quantum Language Processing , 2019, 1902.05162.

[18]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[19]  Omer Giménez,et al.  A Linear Algorithm for the Random Sampling from Regular Languages , 2010, Algorithmica.

[20]  Yann Ponty Rule-weighted and terminal-weighted context-free grammars have identical expressivity , 2012, ArXiv.

[21]  Kyunghyun Cho,et al.  Importance of Search and Evaluation Strategies in Neural Dialogue Modeling , 2018, INLG.

[22]  Bruce McKenzie Generating Strings at Random from a Context Free Grammar , 1997 .

[23]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[24]  D. Deng,et al.  Quantum Entanglement in Neural Network States , 2017, 1701.04844.

[25]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[26]  Ronald de Wolf,et al.  Quantum SDP-Solvers: Better Upper and Lower Bounds , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[27]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[28]  Jinguo Liu,et al.  Approximating quantum many-body wave functions using artificial neural networks , 2017, 1704.05148.

[29]  L. Wossnig,et al.  Quantum Linear System Algorithm for Dense Matrices. , 2017, Physical review letters.

[30]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[31]  Alain Denise,et al.  A new dichotomic algorithm for the uniform random generation of words in regular languages , 2013, Theor. Comput. Sci..

[32]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[33]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[34]  Xuanjing Huang,et al.  Transition-Based Dependency Parsing with Long Distance Collocations , 2015, NLPCC.

[35]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[36]  Sanjiv Kapoor,et al.  A Quantum Algorithm for finding the Maximum , 1999, quant-ph/9911082.

[37]  Andrew M. Childs,et al.  Exponential improvement in precision for simulating sparse Hamiltonians , 2013, Forum of Mathematics, Sigma.

[38]  Scott Aaronson,et al.  A Quantum Query Complexity Trichotomy for Regular Languages , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[39]  Mingbo Ma,et al.  Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation , 2018, EMNLP.

[40]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[41]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[42]  Craig Gidney,et al.  Halving the cost of quantum addition , 2017, Quantum.

[43]  Ashley Montanaro,et al.  Quantum algorithms: an overview , 2015, npj Quantum Information.

[44]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[45]  Massimo Stella,et al.  Investigating the Phonetic Organisation of the English Language via Phonological Networks, Percolation and Markov Models , 2014, ECCS.

[46]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[47]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[48]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[49]  Jason Weston,et al.  Importance of a Search Strategy in Neural Dialogue Modelling , 2018, ArXiv.

[50]  Chris Dyer,et al.  Transition-Based Dependency Parsing with Heuristic Backtracking , 2016, EMNLP.

[51]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[52]  Jacques Cohen,et al.  Uniform Random Generation of Strings in a Context-Free Language , 1983, SIAM J. Comput..

[53]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[54]  A. Denise,et al.  Random generation of words of context-free languages according to the frequencies of letters , 2000 .

[55]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[56]  Alain Denise,et al.  Uniform random generation of words of rational languages , 1996 .

[57]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[58]  Noah Constant,et al.  Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.

[59]  A. Harrow,et al.  Quantum algorithm for linear systems of equations. , 2008, Physical review letters.

[60]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[61]  Felix Leditzky,et al.  Quantum codes from neural networks , 2018, New Journal of Physics.

[62]  Ewa Dabrowska,et al.  Questions with long-distance dependencies: A usage-based perspective , 2008 .

[63]  Ashley Montanaro,et al.  Quantum Search with Advice , 2009, TQC.

[64]  David Chiang,et al.  Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[65]  Harry Buhrman,et al.  Time and Space Bounds for Reversible Simulation , 2001, ICALP.

[66]  Seth Lloyd,et al.  Universal Quantum Simulators , 1996, Science.

[67]  Srinivasan Arunachalam,et al.  Optimizing quantum optimization algorithms via faster quantum gradient computation , 2017, SODA.

[68]  Ryan Babbush,et al.  The theory of variational hybrid quantum-classical algorithms , 2015, 1509.04279.

[69]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[70]  Andrew M. Childs,et al.  Quantum Algorithm for Linear Differential Equations with Exponentially Improved Dependence on Precision , 2017, Communications in Mathematical Physics.

[71]  Joakim Nivre,et al.  Evaluation of Dependency Parsers on Unbounded Dependencies , 2010, COLING.

[72]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[73]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[74]  Beatrice Palano,et al.  On the Circuit Complexity of Random Generation Problems for Regular and Context-Free Languages , 2001, STACS.

[75]  Iordanis Kerenidis,et al.  Quantum Recommendation Systems , 2016, ITCS.

[76]  Guang-Can Guo,et al.  Efficient machine-learning representations of a surface code with boundaries, defects, domain walls, and twists , 2018, Physical Review A.

[77]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[78]  Johannes Bausch,et al.  Classifying data using near-term quantum devices , 2018, International Journal of Quantum Information.