On learning noisy threshold functions with finite precision weights

We address the issue of the precision required by an N-input threshold element in order to implement a linearly separable mapping. In distinction with previous work we require only the ability to correctly implement the mapping of P randomly chosen training examples, as opposed to the complete boolean mapping. Our results are obtained within the statistical mechanics approach and are thus average case results as opposed to the worst case analyses in the computational learning theory literature. We show that as long as the fraction P/N is finite, then with probability close to 1 as N → ∞ a finite number of bits suffice to implement the mapping. This should be compared to the worst case analysis which requires O(N log N) bits. We also calculate the ability of the constrained network to predict novel examples and compare their predictions to those of an unconstrained network. Finally, we address the issue of the performance of the finite-precision network in the face of noisy training examples.

[1]  B. Natarajan Machine Learning: A Theoretical Approach , 1992 .

[2]  Ron Meir,et al.  Evolving a learning algorithm for the binary perceptron , 1991 .

[3]  Leon N. Cooper,et al.  Learning and generalization in neural networks , 1990 .

[4]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[5]  K. Adkins Theory of spin glasses , 1974 .

[6]  W. Krauth,et al.  Storage capacity of memory networks with binary couplings , 1989 .

[7]  R. Palmer,et al.  The replica method and solvable spin glass model , 1979 .

[8]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[9]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  P. Raghavan,et al.  Learning in threshold networks , 1988, COLT '88.

[12]  H. Gutfreund,et al.  Capacity of neural networks with discrete synaptic couplings , 1990 .

[13]  E. Gardner The space of interactions in neural network models , 1988 .

[14]  Ron Meir,et al.  Learning from examples in weight-constrained neural networks , 1992 .

[15]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[16]  William H. Kautz,et al.  On the Size of Weights Required for Linear-Input Switching Functions , 1961, IRE Transactions on Electronic Computers.

[17]  Jehoshua Bruck,et al.  On the Power of Threshold Circuits with Small Weights , 1991, SIAM J. Discret. Math..

[18]  Sio Carlos,et al.  Evolving a learning algorithm for the binary perceptron , 1991 .

[19]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[20]  E. Gardner,et al.  Optimal storage properties of neural network models , 1988 .

[21]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.

[22]  Meir,et al.  Calculation of learning curves for inconsistent algorithms. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[23]  James A. Pittman,et al.  Recognizing Hand-Printed Letters and Digits Using Backpropagation Learning , 1991, Neural Computation.

[24]  Santosh S. Venkatesh,et al.  Directed Drift: A New Linear Threshold Algorithm for Learning Binary Weights On-Line , 1993, J. Comput. Syst. Sci..