Learning Efficiently with Neural Networks: A Theoretical Comparison between Structured and Flat Representations

We are interested in the relationship between learning efficiency and representation in the case of supervised neural networks for pattern classification trained by continuous error minimization techniques, such as gradient descent. In particular, we focus our attention on a recently introduced architecture called recursive neural network (RNN) which is able to learn class membership of patterns represented as labeled directed ordered acyclic graphs (DOAG). RNNs offer several benefits compared to feedforward and recurrent networks for sequences. However, how RNNs compare to these models in terms of learning efficiency still needs investigation. In this paper we give a theoretical answer by giving a set of results concerning the shape of the error surface and critically discussing the implications of these results on the relative difficulty of learning with different data representations. The message of this paper is that, whenever structured representations are available, they should be preferred to "flat" (array based) representations because they are likely to simplify learning in terms of time complexity.

[1]  Leonard G. C. Hamey Comments on "Can backpropagation error surface not have local minima?" , 1994, IEEE Trans. Neural Networks.

[2]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[3]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Marco Gori,et al.  Adaptive processing of sequences and data structures : International Summer School on Neural Networks "E.R. Caianiello", Vietri sul Mare, Salerno, Italy, September 6-13, 1997, tutorial lectures , 1998 .

[5]  YoungJu Choie,et al.  Local minima and back propagation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[7]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[8]  X H Yu,et al.  On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.

[9]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[10]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[11]  Alessandro Sperduti,et al.  On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.

[12]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[13]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[14]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[15]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).