Improving backpropagation learning with feature selection

There exist redundant, irrelevant and noisy data. Using proper data to train a network can speed up training, simplify the learned structure, and improve its performance. A two-phase training algorithm is proposed. In the first phase, the number of input units of the network is determined by using an information base method. Only those attributes that meet certain criteria for inclusion will be considered as the input to the network. In the second phase, the number of hidden units of the network is selected automatically based on the performance of the network on the training data. One hidden unit is added at a time only if it is necessary. The experimental results show that this new algorithm can achieve a faster learning time, a simpler network and an improved performance.

[1]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[2]  J. C. Schlimmer,et al.  Concept acquisition through representational adjustment , 1987 .

[3]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[4]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[5]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[6]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[9]  George D. Magoulas,et al.  Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods , 1999, Neural Computation.

[10]  Pat Langley,et al.  Trading Off Simplicity and Coverage in Incremental concept Learning , 1988, ML.

[11]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[12]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[15]  Manoel Fernando Tenorio,et al.  Self-organizing network for optimum supervised learning , 1990, IEEE Trans. Neural Networks.

[16]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[17]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[18]  Rudy Setiono A Neural Network Construction Algorithm which Maximizes the Likelihood Function , 1995, Connect. Sci..

[19]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[20]  D. Goldfarb A family of variable-metric methods derived by variational means , 1970 .