A statistical mechanical theory of learning from examples in layered networks at finite temperature is studied. When the training error is a smooth function of continuously varying weights the generalization error falls off asymptotically as the inverse number of examples. By analytical and numerical studies of single-layer perceptrons we show that when the weights are discrete the generalization error can exhibit a discontinuous transition to perfect generalization. For intermediate sizes of the example set, the state of perfect generalization coexists with a metastable spin-glass state. Understanding how systems can be efficiently trained to perform tasks is of fundamental importance. A central issue in learning theory is the rate of improvement in the processing of novel data as a function of the number of examples presented during training, i.e. , the generalization curve. ' Numerical results on training in layered neural networks indicate that the generalization error improves gradually in some cases, and sharply in others. s In this work we use statistical mechanics to study generalization curves in large layered networks. We will first discuss the general theory and then present results for learning in a single-layer perceptron. The computational function of layered neural networks is described in terms of the input-output relations that they generate. We consider here a multilayer network with M input nodes, whose states are denoted by synaptic weights of the network. The network is trained by adjusting its weights to approximate or reproduce, if possible, a target function cto(S) on the input space.