Recent Advances in Large Margin Learning

This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[4]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[5]  Corina S. Pasareanu,et al.  Fast Geometric Projections for Local Robustness Certification , 2020, ICLR.

[6]  Matthias Hein,et al.  Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[7]  Zhi-Hua Zhou,et al.  Optimal Margin Distribution Machine , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[9]  Po-Ling Loh,et al.  Does Data Augmentation Lead to Positive Margin? , 2019, ICML.

[10]  Jieping Ye,et al.  Margin Based PU Learning , 2018, AAAI.

[11]  Nathan Srebro,et al.  Convergence of Gradient Descent on Separable Data , 2018, AISTATS.

[12]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[13]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[14]  Matus Telgarsky,et al.  The implicit bias of gradient descent on nonseparable data , 2019, COLT.

[15]  A. Robert Calderbank,et al.  Discriminative Robust Transformation Learning , 2015, NIPS.

[16]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[18]  Colin Wei,et al.  Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation , 2019, NeurIPS.

[19]  Hossein Mobahi,et al.  Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.

[20]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[21]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[22]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[23]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[24]  Haifeng Shen,et al.  Virtual Class Enhanced Discriminative Embedding Learning , 2018, NeurIPS.

[25]  Changshui Zhang,et al.  Sparse DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[26]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[27]  Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training , 2019, ArXiv.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Shifeng Zhang,et al.  Support Vector Guided Softmax Loss for Face Recognition , 2018, ArXiv.

[30]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[31]  Lei Zhang,et al.  Large Margin Few-Shot Learning , 2018, ArXiv.

[32]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[33]  Yifan Gong,et al.  Recurrent support vector machines for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Nathan Srebro,et al.  Learning Optimally Sparse Support Vector Machines , 2013, ICML.

[35]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[36]  Matus Telgarsky,et al.  Gradient descent aligns the layers of deep linear networks , 2018, ICLR.

[37]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[38]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[39]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[40]  Amit Daniely,et al.  SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.

[41]  Tommi S. Jaakkola,et al.  Towards Robust, Locally Linear Deep Networks , 2019, ICLR.

[42]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[43]  Matthew Mirman,et al.  Fast and Effective Robustness Certification , 2018, NeurIPS.

[44]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[45]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[46]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[47]  Kaisheng Yao,et al.  Deep neural support vector machines for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[49]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[50]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[51]  Shenghua Gao,et al.  Margin Learning Embedded Prediction for Video Anomaly Detection with A Few Anomalies , 2019, IJCAI.

[52]  Yaoliang Yu,et al.  Understanding Adversarial Robustness: The Trade-off between Minimum and Average Margin , 2019, ArXiv.

[53]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[54]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Song Han,et al.  DSD: Dense-Sparse-Dense Training for Deep Neural Networks , 2016, ICLR.

[56]  Andrew M. Saxe,et al.  Minnorm training: an algorithm for training overcomplete deep neural networks , 2018 .

[57]  Guillermo Sapiro,et al.  Generalization Error of Invariant Classifiers , 2016, AISTATS.

[58]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[59]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[60]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[61]  Sijia Liu,et al.  CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks , 2018, AAAI.

[62]  Long Chen,et al.  On Connections Between Regularizations for Improving DNN Robustness , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[64]  Tie-Yan Liu,et al.  On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[65]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  Mohammed Bennamoun,et al.  Contractive Rectifier Networks for Nonlinear Maximum Margin Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Stephen Wright,et al.  Convergence and Margin of Adversarial Training on Separable Data , 2019, ArXiv.

[68]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[69]  Wei Chen,et al.  SupportNet: solving catastrophic forgetting in class incremental learning with support data , 2018, ArXiv.

[70]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[71]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[72]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Colin Wei,et al.  Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin , 2019, ICLR.

[74]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[75]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[76]  Jascha Sohl-Dickstein,et al.  Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.

[77]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[79]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[80]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[83]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[84]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[85]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[86]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Nathan Srebro,et al.  Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.

[88]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[90]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[91]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[92]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[93]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[94]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[95]  Takumi Kobayashi,et al.  Large Margin In Softmax Cross-Entropy Loss , 2019, BMVC.

[96]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[97]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[98]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[99]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[100]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[101]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[102]  Medhat A. Moussa,et al.  Attacking Binarized Neural Networks , 2017, ICLR.

[103]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[104]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Shifeng Zhang,et al.  Ensemble Soft-Margin Softmax Loss for Image Classification , 2018, IJCAI.

[106]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[107]  Weihong Deng,et al.  Adversarial Learning With Margin-Based Triplet Embedding Regularization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[108]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[109]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[110]  Pushmeet Kohli,et al.  A Dual Approach to Scalable Verification of Deep Networks , 2018, UAI.

[111]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[112]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[113]  Yiwen Guo,et al.  Adversarial Margin Maximization Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[115]  Dong Yu,et al.  Large Margin Training for Attention Based End-to-End Speech Recognition , 2019, INTERSPEECH.

[116]  Ruitong Huang,et al.  Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.