Recurrent Neural Networks as Weighted Language Recognizers

We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. We show that most problems for such RNNs are undecidable, including consistency, equivalence, minimization, and the determination of the highest-weighted string. However, for consistent RNNs the last problem becomes decidable, although the solution length can surpass all computable bounds. If additionally the string is limited to polynomial length, the problem becomes NP-complete and APX-hard. In summary, this shows that approximations and heuristic algorithms are necessary in practical applications of those RNNs.

[1]  Mehryar Mohri,et al.  LP Distance and Equivalence of Probabilistic Automata , 2007, Int. J. Found. Comput. Sci..

[2]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[3]  Géraud Sénizergues,et al.  The Equivalence Problem for Deterministic Pushdown Automata is Decidable , 1997, ICALP.

[4]  Mariëlle Stoelinga,et al.  An Introduction to Probabilistic Automata , 2002, Bull. EATCS.

[5]  Donald E. Knuth,et al.  A Generalization of Dijkstra's Algorithm , 1977, Inf. Process. Lett..

[6]  Tao Jiang,et al.  Minimal NFA Problems are Hard , 1991, SIAM J. Comput..

[7]  Timothy V. Griffiths The unsolvability of the Equivalence Problem for Λ-Free nondeterministic generalized machines , 1968, JACM.

[8]  T. Rado On non-computable functions , 1962 .

[9]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[10]  Giorgio Satta,et al.  Estimation of Consistent Probabilistic Context-free Grammars , 2006, HLT-NAACL.

[11]  Arnold L. Rosenberg,et al.  Real-Time Simulation of Multihead Tape Units , 1972, JACM.

[12]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[13]  James Worrell,et al.  Language equivalence of probabilistic pushdown automata , 2014, Inf. Comput..

[14]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[15]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[16]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[17]  Zvi Galil Palindrome Recognition in Real Time by a Multitape Turing Machine , 1978, J. Comput. Syst. Sci..

[18]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[19]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[20]  J. Hartmanis,et al.  On the Computational Complexity of Algorithms , 1965 .

[21]  Zvi Galil,et al.  String Matching in Real Time , 1981, JACM.

[22]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Colin de la Higuera,et al.  The most probable string: an algorithmic study , 2014, J. Log. Comput..

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[27]  Ronald L. Rivest,et al.  The Design and Analysis of Computer Algorithms , 1990 .

[28]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[29]  Colin de la Higuera,et al.  Computing the Most Probable String with a Probabilistic Finite State Machine , 2013, FSMNLP.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Hava T. Siegelmann,et al.  On the power of sigmoid neural networks , 1993, COLT '93.