论文信息 - Adaptive processing of structural data: from sequences to trees and beyond

Adaptive processing of structural data: from sequences to trees and beyond

There are several challenging real-world problems, for instance in the fields of medical and technical diagnosis, molecular biology, or document and image processing, where the objects of interest are significantly structured, and their component parts have a continuous nature and are subjected to different types of noise. For a number of practical tasks, the solution can be described by example data, but there is no or only partial and uncertain prior expert knowledge about the relevant structural concepts. Thus, it would be advantageous to have methods and tools to (automatically) infer the solutions, i.e. the desired input-output mapping, from the given example data. In computer science, structural (for example causal, topological, or hierarchical) relationships between parts of a object are commonly represented by symbolic formalisms such as graphs, terms or diagrams. Symbolic machine learning approaches can deal with these representations, but fail if the range of the intended mapping is of continuous nature. On the other hand, existing analog models of computation and learning are tailored to the processing of continuous information. However, these models assume that data are organized according relatively poor structures, by and large, arrays and sequences. This work contributes in bridging this gap. We propose tree-recursive dynamical systems (TRDS), a new class of deterministic state machines that operate in a continuous state space. These machines enable the representation and the inductive inference of structure mappings. The most general admissible domain is characterized by rooted labeled ordered trees (and a certain class of rooted labeled directed ordered acyclic graphs) whose vertices can be labeled by continuous feature vectors. The range of these mappings may either be (a subspace of) the Euclidean vector space or a finite set of categorical values. Adaptivity is incorporated into TRDS by choosing parameterized functions for the state transition map and the output map. Inductive learning tasks, such as the classification or the regression of tree structures, can be re-formulated as the optimization (minimization) of an error criterion. If the given error criterion is stated by a continuously differentiable function, then gradient-based optimization methods are usually taken into account to solve the learning task. We develop and analyze two different algorithms for the calculation of gradient information, backpropagation through structure (BPTS) and tree-recursive gradient computation (TRGC). Both algorithms can be used to calculate the first-order gradient for arbitrary continuous and differentiable criteria where tree structures are embedded via TRDS mappings. This enables attacking inductive learning tasks on structural data by means of TRDS and a variety of

Andreas Küchler | A. Küchler

[1] E. Mark Gold,et al. Language Identification in the Limit , 1967, Inf. Control..

[2] Jordan B. Pollack,et al. Analysis of Dynamical Recognizers , 1997, Neural Computation.

[3] Alessandro Sperduti,et al. Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[4] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, Comb..

[5] Marco Gori,et al. Optimal learning in artificial neural networks: A review of theoretical results , 1996, Neurocomputing.

[6] Helko Lehmann,et al. Designing a Counter: Another Case Study of Dynamics and Activation Landscapes in Recurrent Networks , 1997, KI.

[7] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[8] Jonas Mockus,et al. Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[9] Magnus Steinby,et al. Tree Language Problems in Pattern Recognition Theory (Extended Abstract) , 1989, FCT.

[10] Marvin Minsky,et al. Perceptrons: expanded edition , 1988 .

[11] J. Stephen Judd,et al. Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[12] Laurent Miclet,et al. Structural Methods in Pattern Recognition , 1986 .

[13] C. Lee Giles,et al. The Neural Network Pushdown Automaton: Architecture, Dynamics and Training , 1997, Summer School on Neural Networks.

[14] Jean H. Gallier,et al. Tree Pushdown Automata , 1985, J. Comput. Syst. Sci..

[15] P. Schulz. Some Experiments on the Applicability of Folding Architecture Networks to Guide Theorem Proving , 1997 .

[16] C. Lee Giles,et al. Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[17] José Carlos Príncipe,et al. The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[18] Geoffrey E. Hinton. Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[19] John Moody,et al. Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[20] Ah Chung Tsoi,et al. FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[21] Andreas Küchler,et al. Tree-recursive computation of gradient information for structures , 1999, ESANN.

[22] Marco Gori,et al. Suspiciousness of loading problems , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[23] Lonnie Chrisman,et al. Learning Recursive Distributed Representations for Holistic Computation , 1991 .

[24] C. Lee Giles,et al. Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[25] C. Hansch,et al. Quantitative Structure‐Activity Relationships of the Benzodiazepines. A Review and Reevaluation. , 1995 .

[26] Michael L. Baird. Structural Pattern Recognition , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .

[28] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[29] Risto Miikkulainen,et al. Discovering Complex Othello Strategies through Evolutionary Neural Networks , 1995, Connect. Sci..

[30] Ron Sun,et al. Computational Architectures Integrating Neural And Symbolic Processes , 1994 .

[31] Pekka Orponen,et al. On the Effect of Analog Noise in Discrete-Time Analog Computations , 1996, Neural Computation.

[32] Patrick J. Grother,et al. The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[33] Hava T. Siegelmann,et al. The complexity of language recognition by neural networks , 1992, Neurocomputing.

[34] David J. C. MacKay,et al. Bayesian Methods for Backpropagation Networks , 1996 .

[35] Mikel L. Forcada,et al. Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[36] Laurent Wendling,et al. Pattern Recognition by Splitting Images Into Trees of Fuzzy Regions , 1997, Intell. Data Anal..

[37] Kevin Knight,et al. Unification: a multidisciplinary survey , 1989, CSUR.

[38] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[39] Christoph Goller,et al. A connectionist approach for learning search-control heuristics for automated deduction systems , 1999, DISKI.

[40] Prasad Tadepalli,et al. A Formal Framework for Speedup Learning from Problems and Solutions , 1996, J. Artif. Intell. Res..

[41] S C Kleene,et al. Representation of Events in Nerve Nets and Finite Automata , 1951 .

[42] J A McCammon,et al. Computer-aided molecular design. , 1987, Science.

[43] Manfred Glesner,et al. Neurocomputers: an overview of neural networks in VLSI , 1994 .

[44] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[45] George Berg,et al. A Connectionist Parser with Recursive Sentence Structure and Lexical Disambiguation , 1992, AAAI.

[46] Ferenc Gécseg,et al. Tree Languages , 1997, Handbook of Formal Languages.

[47] C. Goller,et al. Relating Chemical Structure to Activity: An Application of the Neural Folding Architecture , 1998 .

[48] J. Fodor,et al. Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[49] Carl H. Smith,et al. Inductive Inference: Theory and Methods , 1983, CSUR.

[50] Ah Chung Tsoi,et al. Special issue on recurrent neural networks for sequence processing , 1997 .

[51] Alessandro Sperduti,et al. Neural Networks for Processing Data Structures , 1997, Summer School on Neural Networks.

[52] Vasant Honavar,et al. Symbolic Artificial Intelligence and Numeric Artificial Neural Networks: Towards A Resolution of the Dichotomy , 1995 .

[53] Barbara Hammer. Learning recursive data is intractable , 1997 .

[54] Sándor Vágvölgyi,et al. Bottom-Up Tree Pushdown Automata: Classification and Connection with Rewrite Systems , 1994, Theor. Comput. Sci..

[55] Eduardo D. Sontag,et al. Vapnik-Chervonenkis Dimension of Recurrent Neural Networks , 1998, Discret. Appl. Math..

[56] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[57] Brian Everitt,et al. Cluster analysis , 1974 .

[58] D. Signorini,et al. Neural networks , 1995, The Lancet.

[59] Stefan C. Kremer,et al. Comments on "Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution" , 1996, IEEE Trans. Neural Networks.

[60] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[61] Erica Melis,et al. A Model of Analogy-Driven Proof-Plan Construction , 1995, IJCAI.

[62] Q. Zhang. A NEW POLYNOMIAL-TIME ALGORITHM FOR LP , 1996 .

[63] Anil K. Jain,et al. Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[64] R. Palmer,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[65] Peter J. Angeline,et al. An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[66] Dana Angluin,et al. Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[67] Noga Alon,et al. Efficient simulation of finite automata by neural nets , 1991, JACM.

[68] Fred Glover,et al. Tabu Search: A Tutorial , 1990 .

[69] Alessandro Sperduti,et al. Encoding Labeled Graphs by Labeling RAAM , 1993, NIPS.

[70] Alessio Micheli,et al. Quantitative structure-activity relationships of Benzodiazepines by recursive cascade correlation , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[71] Ah Chung Tsoi,et al. Gradient Based Learning Methods , 1997, Summer School on Neural Networks.

[72] Larry Wos,et al. Automated Reasoning: Introduction and Applications , 1984 .

[73] Whitney Tabor,et al. Metric Relations among Analog Computers , 1997 .

[74] Pierre Roussel-Ragot,et al. Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[75] Ulrich Anders,et al. Model selection in neural networks , 1999, Neural Networks.

[76] Vasant Honavar,et al. Books-Received - Artificial Intelligence and Neural Networks - Steps Toward Principled Integration , 1994 .

[77] Christoph Goller,et al. Fakult at F Ur Informatik Der Technischen Universitt at M Unchen Lehrstuhl Viii Forschungsgruppe Automated Reasoning Feature Extraction and Learning Vector Quantization for Data Structures Feature Extraction and Learning Vector Quantization for Data Structures , 2022 .

[78] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[79] Matthias Fuchs,et al. DISCOUNT: A SYstem for Distributed Equational Deduction , 1995, RTA.

[80] Setsuo Arikawa,et al. Pattern Inference , 1995, GOSLER Final Report.

[81] Van H. Vu. On the Infeasibility of Training Neural Networks with Small Squared Errors , 1997, NIPS.

[82] Alessandro Sperduti,et al. Stability properties of labeling recursive auto-associative memory , 1995, IEEE Trans. Neural Networks.

[83] Loke Soo Hsu,et al. Input pattern encoding through generalized adaptive search , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[84] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[85] O. B. Lupanov,et al. Circuits Using Threshold Elements. , 1972 .

[86] Richard Maclin,et al. Refining algorithms with knowledge-based neural networks: improving the Chou-Fasman algorithm for protein folding , 1994, COLT 1994.

[87] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[88] Eduardo D. Sontag,et al. Analog Neural Nets with Gaussian or Other Common Noise Distributions Cannot Recognize Arbitrary Regular Languages , 1999, Neural Computation.

[89] George Nagy,et al. HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[90] K. P. Unnikrishnan,et al. Alopex: A Correlation-Based Learning Algorithm for Feedforward and Recurrent Neural Networks , 1994, Neural Computation.

[91] Andreas Stolcke,et al. Tree matching with recursive distributed representations , 1992, AAAI Conference on Artificial Intelligence.

[92] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[93] Hanno Walischewski,et al. Automatic knowledge acquisition for spatial document interpretation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[94] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[95] Alessandro Sperduti,et al. On the Computational Power of Recurrent Neural Networks for Structures , 1997, Neural Networks.

[96] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..

[97] Horst Bunke,et al. Similarity Measures for Structured Representations , 1993, EWCBR.

[98] Janet Wiles,et al. Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[99] Alessandro Sperduti,et al. On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.

[100] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[101] Hubert Comon,et al. Tree automata techniques and applications , 1997 .

[102] E. Polak,et al. Computational methods in optimization : a unified approach , 1972 .

[103] C. Lee Giles,et al. Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages , 1992, NIPS.

[104] John F. Kolen,et al. Exploring the computational capabilities of recurrent neural networks , 1995 .

[105] Thomas Kailath,et al. Depth-Size Tradeoffs for Neural Computation , 1991, IEEE Trans. Computers.

[106] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[107] Timo Knuutila,et al. The Inference of Tree Languages from Finite Samples: An Algebraic Approach , 1994, Theor. Comput. Sci..

[108] Ashwin Srinivasan,et al. Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[109] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[110] Alessandro Sperduti,et al. On the implementation of frontier-to-root tree automata in recursive neural networks , 1999, IEEE Trans. Neural Networks.

[111] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[112] Kaizhong Zhang,et al. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[113] O. Firschein,et al. Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[114] Ah Chung Tsoi,et al. A Comparison of Discrete-Time Operator Models and for Nonlinear System Identification , 1994, NIPS.

[115] E. Mark Gold,et al. Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[116] M. Gherrity,et al. A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[117] M. Goudreau,et al. First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[118] C. Lee Giles,et al. Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[119] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[120] Matthias Fuchs,et al. Experiments in the Heuristic Use of Past Proof Experience , 1996, CADE.

[121] Marco Maggini,et al. Recursive Neural Networks and Automata , 1997, Summer School on Neural Networks.

[122] Robert J. Schalkoff,et al. Pattern recognition - statistical, structural and neural approaches , 1991 .

[123] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[124] Cristopher Moore,et al. Dynamical Recognizers: Real-Time Language Recognition by Analog Computers , 1998, Theor. Comput. Sci..

[125] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[126] John Doner,et al. Tree Acceptors and Some of Their Applications , 1970, J. Comput. Syst. Sci..

[127] Renée Elio,et al. A theory of grammatical induction in the connectionist paradigm , 1996 .

[128] Sameer Singh,et al. Machine Recognition of Hand-Printed Chinese Characters , 1997, Intell. Data Anal..

[129] Karl Tombre,et al. Structural and Syntactic Methods in Line Drawing Analysis: To Which Extent Do They Work? , 1996, SSPR.

[130] Zbigniew Michalewicz,et al. Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[131] Scott E. Fahlman,et al. The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[132] Thomas Kolbe,et al. Second-Order Matching modulo Evaluation: A Technique for Reusing Proofs , 1995, IJCAI.

[133] T. Kailath,et al. Discrete Neural Computation: A Theoretical Foundation , 1995 .

[134] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[135] Michael R. Genesereth,et al. Logical foundations of artificial intelligence , 1987 .

[136] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[137] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[138] Horst Bunke. Error-Tolerant Graph Matching: A Formal Framework and Algorithms , 1998, SSPR/SPR.

[139] Gian Antonio Mian,et al. Trademark shapes description by string-matching techniques , 1994, Pattern Recognit..

[140] Don R. Hush,et al. Bounds on the complexity of recurrent neural network implementations of finite state machines , 1993, Neural Networks.

[141] Geoffrey E. Hinton,et al. Distributed representations and nested compositional structure , 1994 .

[142] Mike Casey,et al. The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[143] Dennis de Champeaux,et al. About the Paterson-Wegman Linear Unification Algorithm , 1986, J. Comput. Syst. Sci..

[144] Patrick J. Grother,et al. NIST Special Database 19 Handprinted Forms and Characters Database , 1995 .

[145] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[146] Giovanni Soda,et al. Logo Recognition by Recursive Neural Networks , 1997, GREC.

[147] Barak A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[148] Peter Tiňo,et al. Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[149] Roberto Battiti,et al. First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[150] D. Villemin,et al. Use of a neural network to determine the boiling point of alkanes , 1994 .

[151] Eduardo D. Sontag,et al. Vapnik-Chervonenkis Dimension of Recurrent Neural Networks , 1997, Discret. Appl. Math..

[152] David J. Chalmers,et al. Syntactic Transformations on Distributed Representations , 1990 .

[153] Abdel Belaïd,et al. Construction of generic models of document structures using inference of tree grammars , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[154] Van Nam Tran,et al. Syntactic pattern recognition , 1978 .

[155] Marco Gori,et al. Adaptive Graphical Pattern Recognition: The Joint Role of Structure and Learning , 1999 .

[156] Don R. Hush,et al. On the node complexity of neural networks , 1994, Neural Networks.

[157] Frank Y. Shih,et al. Fully parallel thinning with tolerance to boundary noise , 1994, Pattern Recognit..

[158] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[159] Saburo Muroga,et al. Threshold logic and its applications , 1971 .

[160] Alex M. Andrew,et al. Intelligent Hybrid Systems , 1999 .

[161] Donato Malerba,et al. Automated acquisition of rules for document understanding , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[162] Padhraic Smyth,et al. Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[163] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[164] Miroslav Kubat,et al. Initialization of neural networks by means of decision trees , 1995, Knowl. Based Syst..

[165] Min-Hong Han,et al. Identification of cornerpoints of two-dimensional images using a line search method , 1989, Pattern Recognit..

[166] Michael F. Barnsley,et al. Fractals everywhere , 1988 .

[167] Fritz Wysotzki,et al. Learning Relational Concepts with Decision Trees , 1996, ICML.

[168] Tony Plate,et al. Estimating Analogical Similarity by Dot-Products of Holographic Reduced Representations , 1993, NIPS.

[169] Hava T. Siegelmann,et al. The Dynamic Universality of Sigmoidal Neural Networks , 1996, Inf. Comput..

[170] Pierre Baldi,et al. Gradient descent learning algorithm overview: a general dynamical systems perspective , 1995, IEEE Trans. Neural Networks.

[171] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[172] Rolf Ingold,et al. Modeling documents for structure recognition using generalized N-grams , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[173] John F. Kolen,et al. From Sequences to Data Structures: Theory and Applications , 2001 .

[174] Giovanni Soda,et al. Recurrent neural networks and prior knowledge for sequence processing: a constrained nondeterministic approach , 1995, Knowl. Based Syst..

[175] Ah Chung Tsoi,et al. Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[176] Barbara Hammer,et al. Neural networks can approximate mappings on structured objects , 1997 .

[177] Jude Shavlik,et al. THE EXTRACTION OF REFINED RULES FROM KNOWLEDGE BASED NEURAL NETWORKS , 1993 .

[178] Luc De Raedt,et al. Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[179] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[180] Martin Anthony,et al. Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants , 1994 .

[181] Douglas S. Blank,et al. Exploring the Symbolic/Subsymbolic Continuum: A case study of RAAM , 1992 .

[182] Vincent Cadoret,et al. Encoding Syntactical Trees with Labelling Recursive Auto-Associative Memory , 1994, ECAI.

[183] C. Lee Giles,et al. Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[184] Christoph Goller. Learning search-control heuristics for automated deduction systems with folding architecture networks , 1999, ESANN.

[185] Christoph Goller,et al. Inductive Learning in Symbolic Domains Using Structure-Driven Recurrent Neural Networks , 1996, KI.

[186] P. Sopp. Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[187] Rosli Omar,et al. Artificial Intelligence through Logic? , 1994, AI Commun..

[188] Mark Steijvers,et al. A Recurrent Network that performs a Context-Sensitive Prediction Task , 1996 .

[189] Barbara Hammer,et al. On the approximation capability of recurrent neural networks , 2000, Neurocomputing.

[190] Yuan Yan Tang,et al. Automatic document processing: A survey , 1996, Pattern Recognit..

[191] Ralph P. Grimaldi,et al. Discrete and Combinatorial Mathematics: An Applied Introduction , 1998 .

[192] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[193] Tim van Gelder,et al. Compositionality: A Connectionist Variation on a Classical Theme , 1990, Cogn. Sci..

[194] Ah Chung Tsoi,et al. Discrete time recurrent neural network architectures: A unifying review , 1997, Neurocomputing.

[195] Marco Gori. The Loading Problem: Topics in Complexity , 1997, Summer School on Neural Networks.

[196] Eric A. Wan,et al. Diagrammatic Methods for Deriving and Relating Temporal Neural Network Algorithms , 1997, Summer School on Neural Networks.

[197] Mike Paterson,et al. Linear unification , 1976, STOC '76.

[198] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[199] Robert E. Tarjan,et al. Variations on the Common Subexpression Problem , 1980, J. ACM.

[200] Stefan C. Kremer,et al. On the computational power of Elman-style recurrent networks , 1995, IEEE Trans. Neural Networks.

[201] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[202] Alberto Sanfeliu,et al. An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks , 1995, Neural Computation.

[203] A. Sperduti. Labeling Raam , 1994 .

[204] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[205] Marek Karpinski,et al. Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..