Deep domain adaptation with ordinal regression for pain assessment using weakly-labeled videos

Abstract Estimation of pain intensity from facial expressions captured in videos has an immense potential for health care applications. Given the challenges related to subjective variations of facial expressions, and to operational capture conditions, the accuracy of state-of-the-art deep learning (DL) models for recognizing facial expressions may decline. Domain adaptation (DA) has been widely explored to alleviate the problem of domain shifts that typically occur between video data captured across various source (laboratory) and target (operational) domains. Moreover, given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning (WSL) is gaining attention in such applications. State-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relationship among pain intensity levels, nor the temporal coherence of multiple consecutive frames. This paper introduces a new DL model for weakly-supervised DA with ordinal regression (WSDA-OR) that can be adapted using target domain videos with coarse labels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among the intensity levels assigned to target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame). In particular, it learns discriminant and domain-invariant feature representations by integrating multiple instance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from the target domain. The proposed approach was validated using the RECOLA video dataset as fully-labeled source domain data, and UNBC-McMaster shoulder pain video dataset as weakly-labeled target domain data. We have also validated WSDA-OR on BIOVID and Fatigue (private) datasets for sequence level estimation. Experimental results indicate that our proposed approach can significantly improve performance over the state-of-the-art models, allowing to achieve a greater pain localization accuracy. Code is available on GitHub link: https://github.com/praveena2j/WSDAOR .

[1]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[2]  Karan Sikka,et al.  Facial Expression Analysis for Estimating Pain in Clinical Settings , 2014, ICMI.

[3]  Abdenour Hadid,et al.  Combining Global and Local Convolutional 3D Networks for Detecting Depression from Facial Expressions , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[4]  Patrick Cardinal,et al.  Weakly Supervised Learning for Facial Behavior Analysis : A Review , 2021, ArXiv.

[5]  Ute Schmid,et al.  Automatic Detection of Pain from Facial Expressions: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tony X. Han,et al.  Multiple Instance Learning Convolutional Neural Networks for object recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[7]  Gang Hua,et al.  Ordinal Regression with Multiple Output CNN for Age Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiangjun Wang,et al.  Unsupervised Domain Adaptation for Facial Expression Recognition Using Generative Adversarial Networks , 2018, Comput. Intell. Neurosci..

[9]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[10]  Yan Liu,et al.  Semi-supervised manifold ordinal regression for image ranking , 2011, MM '11.

[11]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[12]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[13]  Amit Marathe,et al.  Soft Labels for Ordinal Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Patrick Cardinal,et al.  Deep Weakly Supervised Domain Adaptation for Pain Localization in Videos , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[15]  Gregory D. Hager,et al.  Regularizing face verification nets for pain intensity regression , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[16]  Eric Granger,et al.  Encoding Temporal Information For Automatic Depression Recognition From Facial Analysis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Nicu Sebe,et al.  We are not All Equal: Personalizing Models for Facial Expression Analysis with Transductive Parameter Transfer , 2014, ACM Multimedia.

[18]  Guoying Zhao,et al.  Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  M Lynch,et al.  Pain as the fifth vital sign. , 2001, Journal of intravenous nursing : the official publication of the Intravenous Nurses Society.

[20]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.

[21]  Marian Stewart Bartlett,et al.  Classification and weakly supervised pain localization using multiple segment representation , 2014, Image Vis. Comput..

[22]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[23]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[24]  Maja Pantic,et al.  Multi-Instance Dynamic Ordinal Random Fields for Weakly Supervised Facial Behavior Analysis , 2018, IEEE Transactions on Image Processing.

[25]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[26]  Kamal Nasrollahi,et al.  Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification , 2017, IEEE Transactions on Cybernetics.

[27]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Jean-Philippe Thiran,et al.  ExprADA: Adversarial domain adaptation for facial expression analysis , 2020, Pattern Recognit..

[29]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[33]  Qiang Ji,et al.  Weakly-Supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Abdenour Hadid,et al.  A Spatiotemporal Convolutional Neural Network for Automatic Pain Intensity Estimation from Facial Dynamics , 2019, International Journal of Computer Vision.

[35]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Rashid Ansari,et al.  Learning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning , 2017, IEEE Transactions on Affective Computing.

[37]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[39]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[40]  Qijun Zhao,et al.  Discriminative Feature Adaptation for cross-domain facial expression recognition , 2016, 2016 International Conference on Biometrics (ICB).

[41]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[42]  Florian Metze,et al.  A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Adams Wai-Kin Kong,et al.  Probabilistic Deep Ordinal Regression Based on Gaussian Processes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[45]  Marco Pedersoli,et al.  Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Qiang Ji,et al.  Multi-instance Hidden Markov Model for facial expression recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[47]  Dong Liu,et al.  Adaptive Pooling in Multi-instance Learning for Web Video Annotation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[48]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .