Deep Ordinal Classification with Inequality Constraints

This study investigates a new constrained-optimization formulation for deep ordinal classification. We impose uni-modality of the label distribution implicitly via a set of inequality constraints over pairs of adjacent labels. To tackle the ensuing challenging optimization problem, we solve a sequence of unconstrained losses based on a powerful extension of the log-barrier method. This accommodates standard SGD for deep networks, and avoids computationally expensive Lagrangian dual steps and projections, while outperforming substantially penalty methods. Our non-parametric model is more flexible than the existing deep ordinal classification techniques: it does not restrict the learned representation to a specific parametric model, allowing the training to explore larger spaces of solutions and removing the need for ad hoc choices, while scaling up to large numbers of labels. It can be used in conjunction with any standard classification loss and any deep architecture. We also propose a new performance metric for ordinal classification, as a proxy to measure a distribution uni-modality, referred to as the Sides Order Index (SOI). We report comprehensive evaluations and comparisons to state-of-the-art methods on benchmark public datasets for several ordinal classification tasks, showing the merits of our approach in terms of label consistency and scalability. A public reproducible PyTorch implementation is provided (this https URL).

[1]  Gianluca Pollastri,et al.  A neural network approach to ordinal regression , 2007, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[2]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[3]  Xu Yang,et al.  Deep Age Distribution Learning for Apparent Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  John Moody,et al.  Architecture Selection Strategies for Neural Networks: Application to Corporate Bond Rating Predicti , 1995, NIPS 1995.

[8]  Alexei A. Efros,et al.  Dating Historical Color Images , 2012, ECCV.

[9]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[10]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[11]  Zechao Li,et al.  Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pedro Antonio Gutiérrez,et al.  Current prospects on ordinal and monotonic classification , 2016, Progress in Artificial Intelligence.

[13]  Haitao Xiong,et al.  Structured and Sparse Annotations for Image Emotion Distribution Learning , 2019, AAAI.

[14]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Timothy F. Cootes,et al.  Overview of research on facial ageing using the FG-NET ageing database , 2016, IET Biom..

[17]  Xin Geng,et al.  Emotion Distribution Recognition from Facial Expressions , 2015, ACM Multimedia.

[18]  Christopher Joseph Pal,et al.  Unimodal Probability Distributions for Deep Ordinal Classification , 2017, ICML.

[19]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[20]  Hongming Shan,et al.  Ordinal distribution regression for gait-based age estimation , 2019, Science China Information Sciences.

[21]  Jaime S. Cardoso,et al.  Classification of Ordinal Data Using Neural Networks , 2005, ECML.

[22]  Sathya N. Ravi,et al.  Explicitly Imposing Constraints in Deep Networks via Conditional Gradients Gives Improved Generalization and Faster Convergence , 2019, AAAI.

[23]  Eric Granger,et al.  Deep weakly-supervised learning methods for classification and localization in histology images: a survey , 2019, ArXiv.

[24]  Pascal Fua,et al.  Imposing Hard Constraints on Deep Networks: Promises and Limitations , 2017, CVPR 2017.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christopher Joseph Pal,et al.  A simple squared-error reformulation for ordinal classification , 2016, ArXiv.

[27]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jiwen Lu,et al.  Ordinal Deep Feature Learning for Facial Age Estimation , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[30]  Shiguang Shan,et al.  Mean-Variance Loss for Deep Age Estimation from a Face , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Eric Granger,et al.  Constrained‐CNN losses for weakly supervised segmentation☆ , 2018, Medical Image Anal..

[32]  Zhipeng Jia,et al.  Constrained Deep Weak Supervision for Histopathology Image Segmentation , 2017, IEEE Transactions on Medical Imaging.

[33]  Kellie J Archer,et al.  Penalized Ordinal Regression Methods for Predicting Stage of Cancer in High-Dimensional Covariate Spaces , 2015, Cancer informatics.

[34]  Yehuda Koren,et al.  OrdRec: an ordinal model for predicting personalized item rating distributions , 2011, RecSys '11.

[35]  Jose Dolz,et al.  Min-max Entropy for Weakly Supervised Pointwise Localization , 2019 .

[36]  Xiaolong Wang,et al.  Deeply-Learned Feature for Age Estimation , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[37]  Philip E. Gill,et al.  Practical optimization , 1981 .

[38]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Yanzhu Liu,et al.  Ordinal regression based on data relationship , 2019 .

[40]  Yi-Ping Hung,et al.  Ordinal hyperplanes ranker with cost sensitivities for age estimation , 2011, CVPR 2011.