How Good a Shallow Neural Network Is for Solving Non-linear Decision Making Problems

The universe approximate theorem states that a shallow neural network (one hidden layer) can represent any non-linear function. In this paper, we aim at examining how good a shallow neural network is for solving non-linear decision making problems. We proposed a performance driven incremental approach to searching the best shallow neural network for decision making, given a data set. The experimental results on the two benchmark data sets, Breast Cancer in Wisconsin and SMS Spams, demonstrate the correction of universe approximate theorem, and show that the number of hidden neurons, taking about the half of input number, is good enough to represent the function from data. It is shown that the performance driven BP learning is faster than the error-driven BP learning, and that the performance of the SNN obtained by the former is not worse than that of the SNN obtained by the latter. This indicates that when learning a neural network with the BP algorithm, the performance reaches a certain value quickly, but the error may still keep reducing. The performance of the SNNs for the two databases is comparable to or better than that of the optimal linguistic attribute hierarchy, obtained by a genetic algorithm in wrapper or in terms of semantics manually, which is much time-consuming.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Iddo Greental,et al.  Genetic algorithms for evolving deep neural networks , 2014, GECCO.

[3]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[4]  Jonathan Lawry,et al.  The linguistic attribute hierarchy and its optimisation for classification , 2014, Soft Comput..

[5]  Xiaodong Li,et al.  A Dynamic Neighborhood Learning-Based Gravitational Search Algorithm , 2018, IEEE Transactions on Cybernetics.

[6]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[8]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[9]  Ashutosh Tiwari,et al.  Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[10]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[11]  Ashutosh Tiwari,et al.  A new semantic attribute deep learning with a linguistic attribute hierarchy for spam detection , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[14]  Zheng Wang,et al.  A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[15]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.