Architectural Bias in Recurrent Neural Networks - Fractal Analysis

We have recently shown that when initiated with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines [6,8]. Following [2], we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns.We obtain both lower and upper bounds on various types of fractal dimensions, such as box-counting and Hausdorff dimensions.

[1]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[2]  Wolfgang Thomas,et al.  Automata on Infinite Objects , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[3]  Karel Culik,et al.  Affine automata and related techniques for generation of complex images , 1993, Theor. Comput. Sci..

[4]  Jordan B. Pollack,et al.  Fractal (Reconstructive Analogue) Memory , 1992 .

[5]  Barbara Hammer,et al.  Recurrent networks for structured data – A unifying approach and its properties , 2002, Cognitive Systems Research.

[6]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[7]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[8]  Jordan B. Pollack,et al.  A gradient descent method for a neural fractal memory , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[9]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[10]  K. Doya,et al.  Bifurcations in the learning of recurrent neural networks , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[11]  Paul Rodríguez,et al.  A Recurrent Neural Network that Learns to Count , 1999, Connect. Sci..

[12]  M. Barnsley,et al.  Recurrent iterated function systems , 1989 .

[13]  James L. McClelland,et al.  Learning Subsequential Structure in Simple Recurrent Networks , 1988, NIPS.

[14]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[15]  Mikel L. Forcada,et al.  Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[16]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[17]  Nick Chater,et al.  Toward a connectionist model of recursion in human linguistic performance , 1999 .

[18]  Mikael Bodén,et al.  Learning the Dynamics of Embedded Clauses , 2003, Applied Intelligence.

[19]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[20]  Ludwig Staiger,et al.  Valuations and Unambiguity of Languages, with Applications to Fractal Geometry , 1994, ICALP.

[21]  John J. Sarraille,et al.  FD3: A Program for Measuring Fractal Dimension , 1994 .

[22]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[23]  Barbara Hammer,et al.  Neural networks with small weights implement finite memory machines , 2002 .

[24]  Ludwig Staiger,et al.  Iterated Function Systems and Control Languages , 2001, Inf. Comput..

[25]  Przemyslaw Prusinkiewicz,et al.  Escape-time visualization method for language-restricted iterated function systems , 1992 .

[26]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[27]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[28]  Peter Tiño,et al.  Attractive Periodic Sets in Discrete-Time Recurrent Networks (with Emphasis on Fixed-Point Stability and Bifurcations in Two-Neuron Networks) , 2001, Neural Computation.