Connectionism : with or without rules ? Response to

algebraic rules. In their simulations (as in the Marcus et al. experiment) only half the test sequences have the same structure as the sequences used in training. Nonetheless, the learning process quickly induces similarity among the novel and familiar syllables. As a result, sequences made from the new elements cannot help but tap into the knowledge the system has built up about the sequential structure present in the trained sequences, thereby producing generalization. In summary, we have described a number of possible ways in which the type of generalization exhibited by infants in the Marcus et al. experiments might arise, not from abstract rules, but from the operation of statistical learning mechanisms whose existence is uncontested. We do not claim that one of these possibilities is necessarily correct; our goal has simply been to point out that there are several alternatives to abstract, algebraic rules, and that the results do not implicate such rules because they provide no differential support for abstract rules relative to the other alternatives. Conclusion Generalization of knowledge from given examples to new cases is crucial for intelligent behavior; as Marr 14 pointed out, experience never repeats itself, and so our reactions to every experience depend to some degree on generalization. Marcus and his collaborators are right to emphasize the importance of generalization, and the experiments they have reported likely reflect the existence of impressive powers of generalization in infants. We have suggested, however, that some participants in the debate about the need for rules may have underestimated the potential of alternative forms of computation to address the problem of generalization by mistakenly assuming that statistical learning procedures, including neural networks, are doomed to compute statistics only over 'given variables' 4. In fact neural networks make extensive use of internal representations, onto which the given variables (i.e. the raw input) are mapped. What sets some of the most interesting types of statistical learning procedures often used with neural networks apart from older (and for some, more familiar) statistical procedures is the fact that the network procedures can learn what internal representations ought to be assigned to the given variables. It seems likely to us that infants are born with predispositions to encode inputs in particular ways and with powerful statistical learning procedures like those currently used in network models that can help them refine their initial predispositions and discover new ones. As far as we can …