Searching for Minimal Neural Networks in Fourier Space

The principle of minimum description length suggests looking for the simplest network that works well on the training examples, where simplicity is measured by network description size based on a reasonable programming language for encoding networks. Previous work used an assembler-like universal network encoding language (NEL) and Speed Priorbased search (related to Levin’s Universal Search) to quickly find low-complexity nets with excellent generalization performance. Here we define a more natural and often more practical NEL whose instructions are frequency domain coefficients. Frequency coefficients may get encoded by few bits, hence huge weight matrices may just be low-complexity superpositions of patterns computed by programs with few elementary instructions. On various benchmarks this weight matrix encoding greatly accelerates the search. The scheme was tested on pole-balancing, long-term dependency T-maze, and ball throwing. Some of the solutions turn out to be unexpectedly simple as they are computable by fairly short bit

[1]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[4]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[5]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[6]  Rajarshi Das,et al.  Genetic reinforcement learning for neural networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[8]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[10]  Jürgen Schmidhuber Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995, ICML.

[11]  Corso Elvezia,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997 .

[12]  Jürgen Schmidhuber,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002, COLT.

[13]  Nicholas J. Radcliffe,et al.  Genetic set recombination and its application to neural network topology optimisation , 1993, Neural Computing & Applications.

[14]  Kenneth O. Stanley,et al.  Generating large-scale neural networks through discovering geometric regularities , 2007, GECCO '07.

[15]  Jan Koutník,et al.  NEAT in HyperNEAT Substituted with Genetic Programming , 2009, ICANNGA.

[16]  Tom Schaul,et al.  Towards Practical Universal Search , 2010, AGI 2010.