Group-Linking Method: A Unified Benchmark for Machine Learning with Recurrent Neural Network

This paper proposes a method (Group-Linking Method) that has control over the complexity of the sequential function to construct Finite Memory Machines with minimal order — the machines have the largest number of states based on their memory taps. Finding a machine with maximum number of states is a nontrivial problem because the total number of machines with memory order k is (256)2k-2, a pretty large number. Based on the analysis of Group-Linking Method, it is shown that the amount of data necessary to reconstruct an FMM is the set of strings not longer than the depth of the machine plus one, which is significantly less than that required for traditional greedy-based machine learning algorithm. Group-Linking Method provides a useful systematic way of generating unified benchmarks to evaluate the capability of machine learning techniques. One example is to test the learning capability of recurrent neural networks. The problem of encoding finite state machines with recurrent neural networks has been extensively explored. However, the great representation power of those networks does not guarantee the solution in terms of learning exists. Previous learning benchmarks are shown to be not rich enough structurally in term of solutions in weight space. This set of benchmarks with great expressive power can serve as a convenient framework in which to study the learning and computation capabilities of various network models. A fundamental understanding of the capabilities of these networks will allow users to be able to select the most appropriate model for a given application.

[1]  John F. Kolen,et al.  Evaluating Benchmark Problems by Random Guessing , 2001 .

[2]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[3]  Stephen A. Billings,et al.  Non-linear system identification using neural networks , 1990 .

[4]  Jordan B. Pollack,et al.  The induction of dynamical recognizers , 1991, Machine Learning.

[5]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[6]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[7]  Hong-Te Su,et al.  Identification of Chemical Processes using Recurrent Networks , 1991, 1991 American Control Conference.

[8]  C. Lee Giles,et al.  Learning a class of large finite state machines with a recurrent neural network , 1995, Neural Networks.

[9]  K. P. Unnikrishnan,et al.  Nonlinear prediction of speech signals using memory neuron networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[10]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[11]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[12]  Kumpati S. Narendra,et al.  Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.

[13]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[14]  P. Frasconi,et al.  Representation of Finite State Automata in Recurrent Radial Basis Function Networks , 1996, Machine Learning.

[15]  Karvel K. Thornber,et al.  Equivalence in knowledge representation: automata, recurrent neural networks, and dynamical fuzzy systems , 1999, Proc. IEEE.

[16]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[19]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[20]  C. Lee Giles,et al.  Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants , 1996, Neural Computation.

[21]  P. Werbos,et al.  Long-term predictions of chemical processes using recurrent neural networks: a parallel training approach , 1992 .

[22]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[23]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[24]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[25]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[26]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[27]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[28]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[29]  Stefan C. Kremer,et al.  On the computational power of Elman-style recurrent networks , 1995, IEEE Trans. Neural Networks.

[30]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[31]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[32]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[33]  Edward A. Feigenbaum,et al.  Switching and Finite Automata Theory: Computer Science Series , 1990 .

[34]  Alberto Sanfeliu,et al.  An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks , 1995, Neural Computation.

[35]  Don R. Hush,et al.  Bounds on the complexity of recurrent neural network implementations of finite state machines , 1993, Neural Networks.

[36]  Georgios I. Papadimitriou A New Approach to the Design of Reinforcement Schemes for Learning Automata: Stochastic Estimator Learning Algorithms , 1994, IEEE Trans. Knowl. Data Eng..

[37]  Georgios I. Papadimitriou Hierarchical Discretized Pursuit Nonlinear Learning Automata with Rapid Convergence and High Accuracy , 1994, IEEE Trans. Knowl. Data Eng..

[38]  Eduardo D. Sontag,et al.  Vapnik-Chervonenkis Dimension of Recurrent Neural Networks , 1997, Discret. Appl. Math..

[39]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[40]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[41]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[42]  Eduardo Sontag VC dimension of neural networks , 1998 .

[43]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[44]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[45]  Peter Tiño,et al.  Recurrent Neural Networks with Small Weights Implement Definite Memory Machines , 2003, Neural Computation.

[46]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[47]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[48]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[49]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[50]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[51]  P. S. Sastry,et al.  Memory neuron networks for identification and control of dynamical systems , 1994, IEEE Trans. Neural Networks.

[52]  I. J. Leontaritis,et al.  Input-output parametric models for non-linear systems Part II: stochastic non-linear systems , 1985 .

[53]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[54]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[55]  Solomon W. Golomb,et al.  Shift Register Sequences , 1981 .

[56]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.