Connecting First and Second Order Recurrent Networks with Deterministic Finite Automata

We propose an approach that connects recurrent networks with different orders of hidden interaction with regular grammars of different levels of complexity. We argue that the correspondence between recurrent networks and formal computational models gives understanding to the analysis of the complicated behaviors of recurrent networks. We introduce an entropy value that categorizes all regular grammars into three classes with different levels of complexity, and show that several existing recurrent networks match grammars from either all or partial classes. As such, the differences between regular grammars reveal the different properties of these models. We also provide a unification of all investigated recurrent networks. Our evaluation shows that the unified recurrent network has improved performance in learning grammars, and demonstrates comparable performance on a real-world dataset with more complicated models.

[1]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  William Merrill,et al.  Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[4]  Les E. Atlas,et al.  Recurrent Networks and NARMA Modeling , 1991, NIPS.

[5]  Mikel L. Forcada,et al.  Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units , 2000, Neural Computation.

[6]  Xue Liu,et al.  Verification of Recurrent Neural Networks Through Rule Extraction , 2018, ArXiv.

[7]  James Rogers,et al.  Cognitive and Sub-regular Complexity , 2013, FG.

[8]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[11]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[12]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  Les Atlas,et al.  Recurrent neural networks and time series prediction , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[16]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[17]  Li Li,et al.  Using LSTM and GRU neural network methods for traffic flow prediction , 2016, 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC).

[18]  Douglas Lind,et al.  An Introduction to Symbolic Dynamics and Coding , 1995 .

[19]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[20]  Hermann Ney,et al.  LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.

[21]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[22]  Doina Precup,et al.  Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning , 2018, AISTATS.

[23]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[25]  C. Lee Giles,et al.  Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants , 1996, Neural Computation.

[26]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[27]  Xue Liu,et al.  An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks , 2017, Neural Computation.