A multiobjective genetic algorithm for obtaining the optimal size of a recurrent neural network for grammatical inference

Grammatical inference has been extensively studied in recent years as a result of its wide field of application, and in turn, recurrent neural networks have proved themselves to be a good tool for grammatical inference. The learning algorithms for these neural networks, however, have been far less studied than those for feed-forward neural networks. Classical training methods for recurrent neural networks suffer from being trapped in local minimal and having a high computational time. In addition, selecting the optimal size of a neural network for a particular application is a difficult task. This suggests that the problems of developing methods to determine optimal topologies and new training algorithms should be studied. In this paper, we present a multi-objective evolutionary algorithm which is able to determine the optimal size of recurrent neural networks in any particular application. This is specially analyzed in the case of grammatical inference: in particular, we study how to establish the optimal size of a recurrent neural network in order to learn positive and negative examples in a certain language, and how to determine the corresponding automaton using a self-organizing map once the training has been completed.

[1]  M. W. Shields An Introduction to Automata Theory , 1988 .

[2]  Armando Blanco,et al.  Extracting rules from a (fuzzy/crisp) recurrent neural network using a self‐organizing map , 2000 .

[3]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[4]  Joel N. Morse,et al.  Reducing the size of the nondominated set: Pruning by clustering , 1980, Comput. Oper. Res..

[5]  S. A. Solla Capacity control in classifiers for pattern recognition , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[6]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[7]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[8]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[9]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[11]  Armando Blanco,et al.  Speech Recognition Using Fuzzy Second-Order Recurrent Neural Networks , 2001, IWANN.

[12]  Armando Blanco,et al.  A real-coded genetic algorithm for training recurrent neural networks , 2001, Neural Networks.

[13]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[14]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[16]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[17]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[18]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[19]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[20]  C. L. Giles,et al.  Constructive learning of recurrent neural networks , 1993, IEEE International Conference on Neural Networks.

[21]  DebKalyanmoy Multi-objective genetic algorithms , 1999 .

[22]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[23]  Armando Blanco,et al.  Fuzzy automaton induction using neural network , 2001, Int. J. Approx. Reason..

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[26]  Armando Blanco Morón,et al.  Fuzzy grammatical inference using neural network , 1998 .

[27]  Armando Blanco,et al.  Extracting rules from a (fuzzy/crisp) recurrent neural network using a self-organizing map , 2000, Int. J. Intell. Syst..

[28]  Alberto Sanfeliu,et al.  A Hybrid Connectionist-Symbolic Approach to Regular Grammatical Inference Based on Neural Learning and Hierarchical Clustering , 1994, ICGI.

[29]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[30]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[31]  Alden H. Wright,et al.  Genetic Algorithms for Real Parameter Optimization , 1990, FOGA.

[32]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[33]  Hideto Tomabechi,et al.  A parallel recurrent cascade-correlation neural network with natural connectionist glue , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[34]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[35]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[36]  Padhraic Smyth,et al.  Learning Finite State Machines With Self-Clustering Recurrent Networks , 1993, Neural Computation.

[37]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[38]  Armando Blanco,et al.  A genetic algorithm to obtain the optimal recurrent neural network , 2000, Int. J. Approx. Reason..

[39]  J. Gero,et al.  REDUCING THE PARETO OPTIMAL SET IN MULTICRITERIA OPTIMIZATION(With Applications to Pareto Optimal Dynamic Programming) , 1985 .

[40]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[41]  Michael A. Harrison,et al.  Introduction to formal language theory , 1978 .

[42]  C. L. Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[43]  Kalyanmoy Deb,et al.  Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems , 1999, Evolutionary Computation.

[44]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .