Order Regularization on Ordinal Loss for Head Pose, Age and Gaze Estimation

Ordinal loss is widely used in solving regression problems with deep learning technologies. Its basic idea is to convert regression to classification while preserving the natural order. However, the order constraint is enforced only by ordinal label implicitly, leading to the real output values not strictly in order. It causes the network to learn separable feature rather than discriminative feature, and possibly overfit on training set. In this paper, we propose order regularization on ordinal loss, which makes the outputs in order by explicitly constraining the ordinal classifiers in order. The proposed method contains two parts, i.e. similar-weights constraint, which reduces the ineffective space between classifiers, and differential-bias constraint, which enforces the decision planes in order and enhances the discrimination power of the classifiers. Experimental results show that our proposed method boosts the performance of original ordinal loss on various regression problems such as head pose, age, and gaze estimation, with significant error reduction of around 5%. Furthermore, our method outperforms the state of the art on all these tasks, with the performance gain of 14.4%, 2.2% and 6.5% on head pose, age and gaze estimation respectively. Introduction Benefiting from the strong ability of feature representation, convolution neural network (CNN) is widely used to solve regression problems, such as head pose (Yang et al. 2019; Ruiz, Chong, and Rehg 2018), age (Li et al. 2019; Chen et al. 2017; Zhang et al. 2017b), gaze (Park et al. 2019; Krafka et al. 2016; Cheng et al. 2020), and depth estimation (Fu et al. 2018). Most researchers prefer enhanced Softmax (Gao et al. 2017) or ordinal loss (Chen et al. 2017; Fu et al. 2018) to L2 loss, because such loss functions quantize the continuous value to discrete value, converting the regression problem to a classification problem, which is less sensitive to outliers compared with L2 loss. Among them, ordinal loss is outstanding, because it preserves the property of the regression problem, which means that the farther from the ground truth the prediction, the larger the punishment. In order to employ the ordinal loss, a continuous value gt is converted to an ordinal label y, which is a vector with the *Work was done when they were employed by SRC-B. Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. length of N , using the following formula: y = { 1, if (n+ 1) ·BinSize+Rmin ≤ gt 0, otherwise (1) Where y(0≤n < N) is the n-th component of y, and BinSize quantizes the regression range [Rmin, Rmax] into N + 1 intervals. Each y has a corresponding binary classifier, i.e. ordinal classifier, and the ordinal loss is defined as the cross-entropy loss to supervise all N binary classifiers with y. Now we focus on the ordinal classifier. The decision plane of the n-th classifier is denoted as g(wn, bn) := wnx + bn, for a feature x extracted by CNN. It judges whether the condition in Eq. 1 is satisfied. For greater regression values, more and more classifiers output 1 sequentially. Thus, intuitively there should be the following constraint: g(w0, b0) ≥ g(w1, b1) ≥ . . . ≥ g(wN−1, bN−1) We call it implicit order constraint in ordinal loss. However, this constraint may not be satisfied in real situations. We observed that the values computed with the decision planes are not strictly in order, as shown in Fig. 1(a). The invalid order problem may cause the classifiers easy to overfit, since the learned feature is separable rather than discriminative. Fig. 1-(b) shows the 2D geometric interpretation with a toy model consisting of three classifiers. The training samples (represented as black shapes) can be perfectly classified. However, the feature is separable rather than discriminative. Thus a test sample in star category (i.e. the red star) may be misclassified to the circle category, crossing several planes, which has larger error than misclassified to the neighbouring triangle category. In this paper, we propose an order regularization to constrain the order of the classifiers explicitly. The basic idea is that given a x, the output values, i.e. wnx + bn, n = 0, 1 · · ·N − 1 in order can be accomplished through constraining the decision planes in order. To achieve this goal, firstly, we make the weights of all decision planes be similar by introducing similar-weights constraint, which means w0 ≈ w1 ≈ · · · ≈ wN−1. Secondly, we make all the bias bn, n = 0, 1, · · ·N − 1 in order by introducing differentialbias constraint. The 2D geometric interpretation is shown in The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

[1]  Fei Wang,et al.  A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation , 2020, AAAI.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[5]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[6]  Bo Wang,et al.  Deep Regression Forests for Age Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Yung-Yu Chuang,et al.  FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ming Dong,et al.  Using Ranking-CNN for Age Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Shiguang Shan,et al.  Mean-Variance Loss for Deep Age Estimation from a Face , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Qiang Wang,et al.  HCR-Net: A Hybrid of Classification and Regression Network for Object Pose Estimation , 2018, IJCAI.

[12]  Yi-Ping Hung,et al.  Ordinal hyperplanes ranker with cost sensitivities for age estimation , 2011, CVPR 2011.

[13]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Amit Marathe,et al.  Soft Labels for Ordinal Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shiguang Shan,et al.  Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Li Liu,et al.  Quantifying Facial Age by Posterior of Age Comparisons , 2017, BMVC.

[18]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[19]  Hui Zhang,et al.  A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20]  Otmar Hilliges,et al.  Deep Pictorial Gaze Estimation , 2018, ECCV.

[21]  Jiwen Lu,et al.  BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Karl Ricanek,et al.  MORPH: a longitudinal image database of normal adult age-progression , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[23]  Chi Keong Goh,et al.  A Constrained Deep Neural Network for Ordinal Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Kai Zhao,et al.  Label Distribution Learning Forests , 2017, NIPS.

[26]  Adams Wai-Kin Kong,et al.  Probabilistic Deep Ordinal Regression Based on Gaussian Processes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jan Kautz,et al.  Few-Shot Adaptive Gaze Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Changxing Ding,et al.  Soft-Ranking Label Encoding for Robust Facial Age Estimation , 2019, IEEE Access.

[29]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[30]  Hyunwoo Kim,et al.  Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Chi Keong Goh,et al.  Deep Ordinal Regression Based on Data Relationship for Small Datasets , 2017, IJCAI.

[36]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xiu-Shen Wei,et al.  Deep Label Distribution Learning for Apparent Age Estimation , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[40]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.