How Does Loss Function Affect Generalization Performance of Deep Learning? Application to Human Age Estimation

Good generalization performance across a wide variety of domains caused by many external and internal factors is the fundamental goal of any machine learning algorithm. This paper theoretically proves that the choice of loss function matters for improving the generalization performance of deep learning-based systems. By deriving the generalization error bound for deep neural models trained by stochastic gradient descent, we pinpoint the characteristics of the loss function that is linked to the generalization error, and can therefore be used for guiding the loss function selection process. In summary, our main statement in this paper is: choose a stable loss function, generalize better. Focusing on human age estimation from the face which is a challenging topic in computer vision, we then propose a novel loss function for this learning problem. We theoretically prove that the proposed loss function achieves stronger stability, and consequently a tighter generalization error bound, compared to the other common loss functions for this problem. We have supported our findings theoretically, and demonstrated the merits of the guidance process experimentally, achieving significant improvements.

[1]  Bertrand Granado,et al.  Joint Sparse Learning With Nonlocal and Local Image Priors for Image Error Concealment , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[3]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[4]  Alister G. Burr,et al.  Deep Learning-Aided Finite-Capacity Fronthaul Cell-Free Massive MIMO with Zero Forcing , 2020, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[5]  J. Kittler,et al.  A Novel Ground Metric for Optimal Transport-Based Chronological Age Estimation. , 2021, IEEE transactions on cybernetics.

[6]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[7]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[8]  Josef Kittler,et al.  A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[9]  Karl Ricanek,et al.  MORPH: a longitudinal image database of normal adult age-progression , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[10]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[11]  Natalie C. Ebner,et al.  FACES—A database of facial expressions in young, middle-aged, and older women and men: Development and validation , 2010, Behavior research methods.

[12]  Mislav Grgic,et al.  SCface – surveillance cameras face database , 2011, Multimedia Tools and Applications.

[13]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[15]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[16]  Mario Vento,et al.  Age from Faces in the Deep Learning Revolution , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Massimiliano Pontil,et al.  Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..

[18]  Fei-Yue Wang,et al.  Stability-Based Generalization Analysis of Distributed Learning Algorithms for Big Data , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Chi-Ho Chan,et al.  Resolution Invariant Face Recognition Using a Distillation Approach , 2020, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[20]  Bertrand Granado,et al.  Image error concealment based on joint sparse representation and non-local similarity , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[21]  Maria Trocan,et al.  Joint-domain dictionary learning-based error concealment using common space mapping , 2017, 2017 22nd International Conference on Digital Signal Processing (DSP).

[22]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[23]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[24]  Shao-Bo Lin,et al.  Generalization and Expressivity for Deep Nets , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Josef Kittler,et al.  Sensitivity of Age Estimation Systems to Demographic Factors and Image Quality: Achievements and Challenges , 2020, 2020 IEEE International Joint Conference on Biometrics (IJCB).

[26]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[27]  Josef Kittler,et al.  Distribution Cognisant Loss for Cross-Database Facial Age Estimation With Sensitivity Analysis , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  Thomas S. Huang,et al.  Human age estimation using bio-inspired features , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[31]  Alister G. Burr,et al.  Exploiting Deep Learning in Limited-Fronthaul Cell-Free Massive MIMO Uplink , 2020, IEEE Journal on Selected Areas in Communications.

[32]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[33]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[34]  Miguel R. D. Rodrigues,et al.  Generalization Error in Deep Learning , 2018, Applied and Numerical Harmonic Analysis.

[35]  Timothy F. Cootes,et al.  Overview of research on facial ageing using the FG-NET ageing database , 2016, IET Biom..

[36]  Chi-Ho Chan,et al.  NPT-Loss: A Metric Loss with Implicit Mining for Face Recognition , 2021, ArXiv.

[37]  Maria Trocan,et al.  Image error concealment using sparse representations over a trained dictionary , 2016, 2016 Picture Coding Symposium (PCS).

[38]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[39]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.