论文信息 - Empirical modeling of very large data sets using neural networks

Empirical modeling of very large data sets using neural networks

Building empirical predictive models from very large data sets is challenging. One has to deal both with the 'curse of dimensionality' (hundreds or thousands of variables) and with 'too many records' (many thousands of instances). While neural networks [Rumelhart, et al., 1986] are widely recognized as universal function approximators [Cybenko, 1989], their training time rapidly increases with the number of variables and instances. I discuss practical methods for overcoming this problem so that neural network models can be developed for very large databases. The methods include: Dimensionality reduction with neural net modeling, PLS modeling, and bottleneck neural networks; Sub-sampling and re-sampling with many smaller data sets to reduce training time; Committee of networks to make the final prediction more robust and to estimate its uncertainty.

Aaron J. Owens

[1] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[2] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .

[3] B. Efron,et al. The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[4] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[5] A. Owens,et al. Efficient training of the backpropagation network by solving a system of stiff ordinary differential equations , 1989, International 1989 Joint Conference on Neural Networks.

[6] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.