A Telescopic Binary Learning Machine for Training Neural Networks

This paper proposes a new algorithm based on multiscale stochastic local search with binary representation for training neural networks [binary learning machine (BLM)]. We study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multiscale version of local search, where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. The learning dynamics are discussed and validated on a highly nonlinear artificial problem and on real-world tasks in many application domains; BLM is finally applied to a problem requiring either feedforward or recurrent architectures for feedback control.

[1]  Yi He,et al.  Optimizing Weights of Neural Network Using an Adaptive Tabu Search Approach , 2005, ISNN.

[2]  Peter L. Bartlett,et al.  Using random weights to train multilayer networks of hard-limiting units , 1992, IEEE Trans. Neural Networks.

[3]  Josep L. Rosselló,et al.  A New Stochastic Computing Methodology for Efficient Neural Network Implementation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Mauro Brunato,et al.  Stochastic Local Search for direct training of threshold networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[5]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[6]  Randall S. Sexton,et al.  Optimization of neural networks: A comparative analysis of the genetic algorithm and simulated annealing , 1999, Eur. J. Oper. Res..

[7]  Juan Humberto Sossa Azuela,et al.  Design of artificial neural networks using a modified Particle Swarm Optimization algorithm , 2009, 2009 International Joint Conference on Neural Networks.

[8]  Roberto Battiti,et al.  Learning with first, second, and no derivatives: A case study in high energy physics , 1994, Neurocomputing.

[9]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  E Weinan,et al.  Multi-scale Modeling and Computation , 2003 .

[12]  Willard L. Miranker,et al.  Multiscale optimization in neural nets , 1991, IEEE Trans. Neural Networks.

[13]  Fionn Murtagh,et al.  Image Processing and Data Analysis - The Multiscale Approach , 1998 .

[14]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[15]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[16]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[17]  Sandro Ridella,et al.  Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.

[18]  Randall S. Sexton,et al.  Comparing backpropagation with a genetic algorithm for neural network training , 1999 .

[19]  Hak-Keung Lam,et al.  Tuning of the structure and parameters of a neural network using an improved genetic algorithm , 2003, IEEE Trans. Neural Networks.

[20]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[21]  Vittorio Maniezzo,et al.  Genetic evolution of the topology and weight distribution of neural networks , 1994, IEEE Trans. Neural Networks.

[22]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[23]  Roberto Battiti,et al.  Training neural nets with the reactive tabu search , 1995, IEEE Trans. Neural Networks.

[24]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[25]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[26]  Marios M. Polycarpou,et al.  Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[27]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[28]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[29]  Bahram Alidaee,et al.  Global optimization for artificial neural networks: A tabu search application , 1998, Eur. J. Oper. Res..

[30]  Mauro Brunato,et al.  Reactive Search and Intelligent Optimization , 2008 .

[31]  Chee Kheong Siew,et al.  Can threshold networks be trained directly? , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[32]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[33]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[34]  Alexander A. Frolov,et al.  Comparison of Seven Methods for Boolean Factor Analysis and Their Evaluation by Information Gain , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Minqiang Li,et al.  Learning Subspace-Based RBFNN Using Coevolutionary Algorithm for Complex Classification Tasks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[38]  Huawei Chen,et al.  Ensembling Extreme Learning Machines , 2007, ISNN.