A systematic review on overfitting control in shallow and deep neural networks

Shallow neural networks process the features directly, while deep networks extract features automatically along with the training. Both models suffer from overfitting or poor generalization in many cases. Deep networks include more hyper-parameters than shallow ones that increase the overfitting probability. This paper states a systematic review of the overfit controlling methods and categorizes them into passive, active, and semi-active subsets. A passive method designs a neural network before training, while an active method adapts a neural network along with the training process. A semi-active method redesigns a neural network when the training performance is poor. This review includes the theoretical and experimental backgrounds of these methods, their strengths and weaknesses, and the emerging techniques for overfitting detection. The adaptation of model complexity to the data complexity is another point in this review. The relation between overfitting control, regularization, network compression, and network simplification is also stated. The paper ends with some concluding lessons from the literature.

[1]  Mehdi Ghatee,et al.  A context aware system for driving style evaluation by an ensemble learning on smartphone sensors data , 2018 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[4]  Derong Liu,et al.  Manifold Regularized Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Hui Li,et al.  Research and development of neural network ensembles: a survey , 2018, Artificial Intelligence Review.

[6]  Mohammed Amer,et al.  A review of modularization techniques in artificial neural networks , 2019, Artificial Intelligence Review.

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  Lars Eldén Matrix Methods in Data Mining and Pattern Recognition, Second Edition , 2019 .

[9]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[10]  Robert P. W. Duin,et al.  On the nonlinearity of pattern classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  Jacek M. Zurada,et al.  Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks , 2014, Neural Networks.

[12]  Gian Luca Foresti,et al.  Adaptive neural tree exploiting expert nodes to classify high-dimensional data , 2020, Neural Networks.

[13]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Ping Luo,et al.  Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[15]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[16]  Kentaro Takagi,et al.  RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets , 2018, ACML.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[19]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ferdinand Hergert,et al.  Extended Regularization Methods for Nonconvergent Model Selection , 1992, NIPS.

[21]  Mehdi Ghatee,et al.  Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts , 2020, ArXiv.

[22]  W. Grimson,et al.  Affine matching of planar sets , 1998 .

[23]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[24]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[25]  Jun Zhou,et al.  Dropout with Tabu Strategy for Regularizing Deep Neural Networks , 2020, Comput. J..

[26]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[27]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[28]  Hadi Zare,et al.  Ensemble decision forest of RBF networks via hybrid feature clustering approach for high-dimensional data classification , 2019, Comput. Stat. Data Anal..

[29]  Hedieh Sajedi,et al.  Convolution neural network joint with mixture of extreme learning machines for feature extraction and classification of accident images , 2019, Journal of Real-Time Image Processing.

[30]  Babajide O Ayinde,et al.  Regularizing Deep Neural Networks by Enhancing Diversity in Feature Extraction , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[32]  Rudy Setiono,et al.  Feedforward Neural Network Construction Using Cross Validation , 2001, Neural Computation.

[33]  Geoff S. Nitschke,et al.  Improving Deep Learning with Generic Data Augmentation , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[34]  Arkadiusz Kwasigroch,et al.  Deep convolutional neural networks as a decision support tool in medical problems - malignant melanoma case study , 2017, KKA.

[35]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[36]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[37]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Lars Elden,et al.  Matrix methods in data mining and pattern recognition , 2007, Fundamentals of algorithms.

[39]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[40]  Spyros G. Tzafestas,et al.  On the overtraining phenomenon of backpropagation neural networks , 1996 .

[41]  Fatih Murat Porikli,et al.  Regularization of Deep Neural Networks with Spectral Dropout , 2017, Neural Networks.

[42]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[43]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[44]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[45]  Mohammad Mahdi Bejani,et al.  Regularized Deep Networks in Intelligent Transportation Systems: A Taxonomy and a Case Study , 2019, ArXiv.

[46]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[47]  Tao Xiang,et al.  Uncooperative gait recognition by learning to rank , 2014, Pattern Recognit..

[48]  Peng Zhang,et al.  Transformed 𝓁1 Regularization for Learning Sparse Deep Neural Networks , 2019, Neural Networks.

[49]  F. Liu,et al.  Adaptive Gaussian Noise Injection Regularization for Neural Networks , 2016, ISNN.

[50]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[51]  Mehdi Ghatee,et al.  FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds , 2013, Comput. Biol. Medicine.

[52]  L. Frank,et al.  Pretopological approach for supervised learning , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[53]  Mehdi Ghatee,et al.  A regularized root-quartic mixture of experts for complex classification problems , 2016, Knowl. Based Syst..

[54]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[55]  Houqiang Li,et al.  Improving Deep Neural Network Sparsity through Decorrelation Regularization , 2018, IJCAI.

[56]  Pedro Costa,et al.  Data-Driven Color Augmentation Techniques for Deep Skin Image Analysis , 2017, ArXiv.

[57]  Georgina Cosma,et al.  Hand gesture recognition using an adapted convolutional neural network with data augmentation , 2018, 2018 4th International Conference on Information Management (ICIM).

[58]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[59]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[60]  R. Eigenmann,et al.  Gradient based adaptive regularization , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[61]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[62]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Mehdi Ghatee,et al.  Adaptive Low-Rank Factorization to regularize shallow and deep neural networks , 2020, ArXiv.

[64]  Ashraf Darwish,et al.  A survey of swarm and evolutionary computing approaches for deep learning , 2019, Artificial Intelligence Review.

[65]  Han Zhao,et al.  Learning Neural Networks with Adaptive Regularization , 2019, NeurIPS.

[66]  H. Engl,et al.  Optimal a posteriori parameter choice for Tikhonov regularization for solving nonlinear ill-posed problems , 1993 .

[67]  Michael Pecht,et al.  Multiple wavelet regularized deep residual networks for fault diagnosis , 2020 .

[68]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[70]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[71]  Paolo Frasconi,et al.  Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.

[72]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[73]  Xiaofeng Liu,et al.  Unimodal regularized neuron stick-breaking for ordinal classification , 2020, Neurocomputing.

[74]  Mohamed Saber Naceur,et al.  Reinforcement learning for neural architecture search: A review , 2019, Image Vis. Comput..

[75]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[76]  Marie Cottrell,et al.  SSM: A Statistical Stepwise Method for Weight Elimination , 1994 .

[77]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[78]  Smit Marvaniya,et al.  Data Augmentation Using Part Analysis for Shape Classification , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[79]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Chen Chen,et al.  GradAug: A New Regularization Method for Deep Neural Networks , 2020, NeurIPS.

[81]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[82]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[83]  David Dunson,et al.  Fiedler Regularization: Learning Neural Networks with Graph Sparsity , 2020, ICML.

[84]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[85]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[86]  Yang Tian,et al.  An Unsupervised Regularization and Dropout based Deep Neural Network and Its Application for Thermal Error Prediction , 2020 .

[87]  Amaury Lendasse,et al.  TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization , 2011, Neurocomputing.

[88]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement , 2009, BMJ.

[89]  Rui Peng,et al.  Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[90]  Mehdi Ghatee,et al.  Least auxiliary loss-functions with impact growth adaptation (Laliga) for convolutional neural networks , 2021, Neurocomputing.

[91]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[92]  Volker Nannen,et al.  The paradox of overfitting , 2003 .

[93]  Mehdi Ghatee,et al.  Hybrid of discrete wavelet transform and adaptive neuro fuzzy inference system for overall driving behavior recognition , 2018, Transportation Research Part F: Traffic Psychology and Behaviour.

[94]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[95]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[96]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[97]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[98]  Le Thi Hoai An,et al.  Group variable selection via ℓp, 0 regularization and application to optimal scoring , 2019, Neural Networks.

[99]  George W. Irwin,et al.  Two-Stage Orthogonal Least Squares Methods for Neural Network Construction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[100]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[101]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[102]  Lars Kai Hansen,et al.  Adaptive Regularization in Neural Network Modeling , 2012, Neural Networks: Tricks of the Trade.

[103]  Peter Wittek,et al.  Quantum Machine Learning: What Quantum Computing Means to Data Mining , 2014 .

[104]  FRED W. SMITH,et al.  Pattern Classifier Design by Linear Programming , 1968, IEEE Transactions on Computers.

[105]  Jin-Tsong Jeng,et al.  Hybrid approach of selecting hyperparameters of support vector machine for regression , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[106]  Xiuwen Liu,et al.  Overfitting Mechanism and Avoidance in Deep Neural Networks , 2019, ArXiv.

[107]  Hui Li,et al.  Evolutionary artificial neural networks: a review , 2011, Artificial Intelligence Review.

[108]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[109]  Anita Sahoo,et al.  Optimization of SVM classifier using Firefly algorithm , 2013, 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013).

[110]  Jari P. Kaipio,et al.  Tikhonov regularization and prior information in electrical impedance tomography , 1998, IEEE Transactions on Medical Imaging.

[111]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[112]  Yang Yu,et al.  Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images , 2017, ICIG.

[113]  Eugenio Culurciello,et al.  Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.

[114]  Mehdi Ghatee,et al.  An ensemble of RBF neural networks in decision tree structure with knowledge transferring to accelerate multi-classification , 2019, Neural Computing and Applications.

[115]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[116]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[117]  Srikumar Ramalingam,et al.  Lossless Compression of Deep Neural Networks , 2020, CPAIOR.

[118]  M. R. Narasinga Rao,et al.  A Survey on Prevention of Overfitting in Convolution Neural Networks Using Machine Learning Techniques , 2018, International Journal of Engineering & Technology.

[119]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[120]  Ah Chung Tsoi,et al.  Lessons in Neural Network Training: Overfitting May be Harder than Expected , 1997, AAAI/IAAI.

[121]  Emmanuel Abbe,et al.  Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets , 2019, ArXiv.

[122]  J. D. Schaffer,et al.  Combinations of genetic algorithms and neural networks: a survey of the state of the art , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[123]  Mehdi Ghatee,et al.  Neural trees with peer-to-peer and server-to-client knowledge transferring models for high-dimensional data classification , 2019, Expert Syst. Appl..

[124]  Zhihai He,et al.  Animal species classification using deep neural networks with noise labels , 2020, Ecol. Informatics.

[125]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[126]  Junbin Gao,et al.  Regularized Flexible Activation Function Combination for Deep Neural Networks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[127]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[128]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[129]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[130]  D. Calvetti,et al.  Tikhonov Regularization of Large Linear Problems , 2003 .

[131]  Sergey Demyanov Regularization methods for neural networks and related models , 2015 .

[132]  M. Z. Nashed,et al.  The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind (C. W. Groetsch) , 1986 .

[133]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[134]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[135]  Anil K. Jain,et al.  A Test to Determine the Multivariate Normality of a Data Set , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[136]  B. H. Shekar,et al.  L1-Regulated Feature Selection and Classification of Microarray Cancer Data Using Deep Learning , 2018, CVIP.

[137]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[138]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[139]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[140]  Bo Zhang,et al.  Neuron Segmentation Based on CNN with Semi-Supervised Regularization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[141]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[142]  Gustavo Deco,et al.  Two Strategies to Avoid Overfitting in Feedforward Networks , 1997, Neural Networks.

[143]  Amir Beck,et al.  On the Solution of the Tikhonov Regularization of the Total Least Squares Problem , 2006, SIAM J. Optim..

[144]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[145]  Ian Stavness,et al.  Bridgeout: Stochastic Bridge Regularization for Deep Neural Networks , 2018, IEEE Access.

[146]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[147]  Guy Blanc,et al.  Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.

[148]  Michael T. Manry,et al.  An integrated growing-pruning method for feedforward network training , 2008, Neurocomputing.

[149]  Michał Grochowski,et al.  Data augmentation for improving deep learning in image classification problem , 2018, 2018 International Interdisciplinary PhD Workshop (IIPhDW).

[150]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[151]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement , 2009, BMJ : British Medical Journal.

[152]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[153]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.

[154]  Jie Yang,et al.  Optimizing the hyper-parameters for SVM by combining evolution strategies with a grid search , 2006 .

[155]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[156]  Michael T. Manry,et al.  A neural network training algorithm utilizing multiple sets of linear equations , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[157]  Mehdi Ghatee,et al.  Theory of adaptive SVD regularization for deep neural networks , 2020, Neural Networks.

[158]  Michał Grochowski,et al.  Computed aided system for separation and classification of the abnormal erythrocytes in human blood , 2017, Biophotonics-Riga.

[159]  Panos J. Antsaklis,et al.  The dependence identification neural network construction algorithm , 1996, IEEE Trans. Neural Networks.

[160]  Wei Wu,et al.  A modified gradient learning algorithm with smoothing L1/2 regularization for Takagi-Sugeno fuzzy models , 2014, Neurocomputing.

[161]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[162]  Ka-Chung Leung,et al.  Improvement of Fingerprint Retrieval by a Statistical Classifier , 2011, IEEE Transactions on Information Forensics and Security.

[163]  Jun Li,et al.  Shakeout: A New Approach to Regularized Deep Neural Network Training , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[164]  Ryad A. Zemouri,et al.  An evolutionary building algorithm for Deep Neural Networks , 2017, 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM).

[165]  Gérard Dreyfus,et al.  Local Overfitting Control via Leverages , 2002, Neural Computation.

[166]  Tai Vu,et al.  How Not to Give a FLOP: Combining Regularization and Pruning for Efficient Inference , 2020, ArXiv.

[167]  Tommy W. S. Chow,et al.  A weight initialization method for improving training speed in feedforward neural network , 2000, Neurocomputing.

[168]  Derong Liu,et al.  Approximate policy iteration with unsupervised feature learning based on manifold regularization , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[169]  Vina Ayumi,et al.  Implementation of deep neural networks (DNN) with batch normalization for batik pattern recognition , 2020, International Journal of Electrical and Computer Engineering (IJECE).

[170]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[171]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[172]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[173]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[174]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[175]  Yuhui Zheng,et al.  Recent Progress on Generative Adversarial Networks (GANs): A Survey , 2019, IEEE Access.

[176]  Andries Petrus Engelbrecht,et al.  A new pruning heuristic based on variance analysis of sensitivity information , 2001, IEEE Trans. Neural Networks.

[177]  Yoshua Bengio,et al.  GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning , 2019, ArXiv.

[178]  Venu Govindaraju,et al.  Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.

[179]  Hadi Meidani,et al.  Physics-Driven Regularization of Deep Neural Networks for Enhanced Engineering Design and Analysis , 2018, J. Comput. Inf. Sci. Eng..

[180]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[181]  Simon Fong,et al.  How meta-heuristic algorithms contribute to deep learning in the hype of big data analytics , 2018 .

[182]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[183]  Miroslav Kubat,et al.  Initialization of neural networks by means of decision trees , 1995, Knowl. Based Syst..

[184]  Ang Li,et al.  Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[185]  J. Larsen,et al.  Design and regularization of neural networks: the optimal use of a validation set , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[186]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[187]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[188]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[189]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[190]  Masanori Suganuma,et al.  A genetic programming approach to designing convolutional neural network architectures , 2017, GECCO.

[191]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[192]  F. Natterer Error bounds for tikhonov regularization in hilbert scales , 1984 .

[193]  Takio Kurita,et al.  Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[194]  J. P. Castagna,et al.  Avoiding overfitting caused by noise using a uniform training mode , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[195]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[196]  Hoai An Le Thi,et al.  Group variable selection via ℓp,0 regularization and application to optimal scoring. , 2019, Neural networks : the official journal of the International Neural Network Society.

[197]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[198]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[199]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[200]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[201]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[202]  Mehdi Ghatee,et al.  Convolutional Neural Network With Adaptive Regularization to Classify Driving Styles on Smartphones , 2020, IEEE Transactions on Intelligent Transportation Systems.

[203]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[204]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.