Nondeterministic Discretization of Weights Improves Accuracy of Neural Networks

The paper investigates modification of backpropagation algorithm, consisting of discretization of neural network weights after each training cycle. This modification, aimed at overfitting reduction, restricts the set of possible values of weights to a discrete subset of real numbers, leading to much better generalization abilities of the network. This, in turn, leads to higher accuracy and a decrease in error rate by over 50% in extreme cases (when overfitting is high). Discretization is performed nondeterministically, so as to keep expected value of discretized weight equal to original value. In this way, global behavior of original algorithm is preserved. The presented method of discretization is general and may be applied to other machine-learning algorithms. It is also an example of how an algorithm for continuous optimization can be successfully applied to optimization over discrete spaces. The method was evaluated experimentally in WEKA environment using two real-world data sets from UCI repository.

[1]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Stan Matwin,et al.  Measuring Quality of Concept Descriptions , 1988, EWSL.

[4]  Simon Parsons Advances in minimum description length by Jae Myung and Mark A. Pitt, edited by Peter D. Grünwald, MIT Press, 444 pp, ISBN 0-262-07262-9 , 2006, Knowl. Eng. Rev..

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[7]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[8]  Manfred Glesner,et al.  Pruning and Regularization Techniques for Feed Forward Nets Applied on a Real World Data Base , 1998, NC.

[9]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[10]  Stan Matwin,et al.  Representing and Acquiring Imprecise and Context-dependent Concepts in Knowledge-Based Systems , 1988, ISMIS.

[11]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[14]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[15]  Marcin Wojnarski LTF-C: Architecture, Training Algorithm and Applications of New Neural Classifier , 2003, Fundam. Informaticae.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[18]  Martin Burger,et al.  Analysis of Tikhonov regularization for function approximation by neural networks , 2003, Neural Networks.