Learnability and robustness of shallow neural networks learned by a performance-driven BP and a variant of PSO for edge decision-making

In many cases, the computing resources are limited without the benefit from GPU, especially in the edge devices of IoT enabled systems. It may not be easy to implement complex AI models in edge devices. The Universal Approximation Theorem states that a shallow neural network (SNN) can represent any nonlinear function. However, how fat is an SNN enough to solve a nonlinear decision-making problem in edge devices? In this paper, we focus on the learnability and robustness of SNNs, obtained by a greedy tight force heuristic algorithm (performance driven BP) and a loose force meta-heuristic algorithm (a variant of PSO). Two groups of experiments are conducted to examine the learnability and the robustness of SNNs with Sigmoid activation, learned/optimised by KPI-PDBPs and KPI-VPSOs, where, KPIs (key performance indicators: error (ERR), accuracy (ACC) and $F_1$ score) are the objectives, driving the searching process. An incremental approach is applied to examine the impact of hidden neuron numbers on the performance of SNNs, learned/optimised by KPI-PDBPs and KPI-VPSOs. From the engineering prospective, all sensors are well justified for a specific task. Hence, all sensor readings should be strongly correlated to the target. Therefore, the structure of an SNN should depend on the dimensions of a problem space. The experimental results show that the number of hidden neurons up to the dimension number of a problem space is enough; the learnability of SNNs, produced by KPI-PDBP, is better than that of SNNs, optimized by KPI-VPSO, regarding the performance and learning time on the training data sets; the robustness of SNNs learned by KPI-PDBPs and KPI-VPSOs depends on the data sets; and comparing with other classic machine learning models, ACC-PDBPs win for almost all tested data sets.

[1]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[2]  Ivan Izonin,et al.  Development of the Non-Iterative Supervised Learning Predictor Based on the Ito Decomposition and SGTM Neural-Like Structure for Managing Medical Insurance Costs , 2018, Data.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[5]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[6]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[7]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[8]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[9]  Yoonsuck Choe,et al.  Comparing Sample-wise Learnability Across Deep Neural Network Models , 2019, AAAI.

[10]  Le Song,et al.  On the Complexity of Learning Neural Networks , 2017, NIPS.

[11]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[12]  Haruna Chiroma,et al.  Weight Optimization in Recurrent Neural Networks with Hybrid Metaheuristic Cuckoo Search Techniques for Data Classification , 2015 .

[13]  Natalia Kryvinska,et al.  Multiple Linear Regression Based on Coefficients Identification Using Non-iterative SGTM Neural-like Structure , 2019, IWANN.

[14]  Richard Alan Peters,et al.  Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives , 2018, Mach. Learn. Knowl. Extr..

[15]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jin Keun Seo,et al.  Improving learnability of neural networks: adding supplementary axes to disentangle data representation , 2019, ArXiv.

[17]  Prof. Dr. Raúl Rojas Neural Networks , 1996, Springer Berlin Heidelberg.

[18]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[19]  Ashirbani Saha,et al.  Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[20]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[21]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[22]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[23]  M. Clerc,et al.  The swarm and the queen: towards a deterministic and adaptive particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[24]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Miltiadis D. Lytras,et al.  Predicting at-risk university students in a virtual learning environment via a machine learning algorithm , 2020, Comput. Hum. Behav..

[26]  Jesada Kajornrit A comparative study of optimization methods for improving artificial neural network performance , 2015, 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE).

[27]  Martin J. Wainwright,et al.  On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.

[28]  Iddo Greental,et al.  Genetic algorithms for evolving deep neural networks , 2014, GECCO.

[29]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[30]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[31]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[32]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[33]  Jonathan Lawry,et al.  Decision tree learning with fuzzy labels , 2005, Inf. Sci..

[34]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[35]  Ashutosh Tiwari,et al.  Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[36]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[37]  Zheng Wang,et al.  A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[38]  Jonathan Lawry,et al.  The linguistic attribute hierarchy and its optimisation for classification , 2014, Soft Comput..

[39]  Gang Xu,et al.  How Good a Shallow Neural Network Is for Solving Non-linear Decision Making Problems , 2018, BICS.

[40]  Wei Li,et al.  Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks , 2019, ICIRA.

[41]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[42]  Xiaodong Li,et al.  A Dynamic Neighborhood Learning-Based Gravitational Search Algorithm , 2018, IEEE Transactions on Cybernetics.

[43]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[45]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[46]  Monojit Choudhury,et al.  Learnability Of Learned Neural Networks , 2018 .

[47]  Yoshikazu Fukuyama,et al.  Practical distribution state estimation using hybrid particle swarm optimization , 2001, 2001 IEEE Power Engineering Society Winter Meeting. Conference Proceedings (Cat. No.01CH37194).

[48]  P. Suganthan Particle swarm optimiser with neighbourhood operator , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[49]  Raúl Rojas,et al.  The Backpropagation Algorithm , 1996 .

[50]  Saman K. Halgamuge,et al.  Particle Swarm Optimization with Self-Adaptive Acceleration Coefficients , 2002, FSKD.