Overfitting and generalization in learning discrete patterns

Abstract Understanding and preventing overfitting is a very important issue in artificial neural network design, implementation, and application. Weigend [10] reports that the presence and absence of overfitting in neural networks depends on how the testing error is measured, and that there is no overfitting in terms of the classification error (symbolic-level errors). In this paper, we show that, in terms of the classification error, overfitting does occur for certain representations used to encode the discrete attributes. We design simple Boolean functions with clear rationale, and present experimental results to support our claims. In addition, we find some interesting results on the best generalization ability of networks in terms of their sizes.

[1]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[2]  B. MacWhinney Connections and symbols: closing the gap , 1993, Cognition.

[3]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[4]  Eytan Domany,et al.  Learning by Choice of Internal Representations , 1988, Complex Syst..

[5]  Charles X. Ling,et al.  Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models , 1993, J. Artif. Intell. Res..

[6]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[7]  V. Marchman,et al.  U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition , 1991, Cognition.

[8]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[9]  Tal Grossman,et al.  Use of Bad Training Data for Better Predictions , 1993, NIPS.

[10]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[11]  Andreas Weigend,et al.  On overfitting and the effective number of hidden units , 1993 .

[12]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[13]  C. Ling,et al.  Answering the connectionist challenge: a symbolic model of learning the past tenses of English verbs , 1993, Cognition.

[14]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[15]  T. Bever,et al.  The relation between linguistic structure and associative theories of language learning—A constructive critique of some connectionist learning models , 1988, Cognition.

[16]  B. MacWhinney,et al.  Implementations are not conceptualizations: Revising the verb learning model , 1991, Cognition.

[17]  Steven Pinker,et al.  Generalisation of regular and irregular morphological patterns , 1993 .