A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.

[1]  H. Chan The Spring and Autumn Period , 2011 .

[2]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[3]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[4]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[7]  Chong You,et al.  Rethinking Bias-Variance Trade-off for Generalization of Neural Networks , 2020, ICML.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Ching Y. Suen,et al.  Towards Robust Pattern Recognition: A Review , 2020, Proceedings of the IEEE.

[11]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[12]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Tongliang Liu,et al.  Understanding (Generalized) Label Smoothing when Learning with Noisy Labels , 2021, ArXiv.

[14]  Binqiang Zhao,et al.  O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[16]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.