Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos

Automatic estimation of pain intensity from facial expressions in videos has an immense potential in health care applications. However, domain adaptation (DA) is needed to alleviate the problem of domain shifts that typically occurs between video data captured in source and target do-mains. Given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning (WSL)is gaining attention in such applications. Yet, most state-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relation between intensity levels, nor the temporal coherence of multiple consecutive frames. This paper introduces a new deep learn-ing model for weakly-supervised DA with ordinal regression(WSDA-OR), where videos in target domain have coarse la-bels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among the intensity levels as-signed to the target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame). In particular, it learns discriminant and domain-invariant feature representations by integrating multiple in-stance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from the target domain. The proposed approach was validated on the RECOLA video dataset as fully-labeled source domain, and UNBC-McMaster video data as weakly-labeled target domain. We have also validated WSDA-OR on BIOVID and Fatigue (private) datasets for sequence level estimation. Experimental results indicate that our approach can provide a significant improvement over the state-of-the-art models, allowing to achieve a greater localization accuracy.

[1]  Karan Sikka,et al.  Facial Expression Analysis for Estimating Pain in Clinical Settings , 2014, ICMI.

[2]  Qiang Ji,et al.  Multi-instance Hidden Markov Model for facial expression recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[3]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[4]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[5]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[6]  Ute Schmid,et al.  Automatic Detection of Pain from Facial Expressions: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Patrick Cardinal,et al.  Deep Weakly Supervised Domain Adaptation for Pain Localization in Videos , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[8]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[9]  Rashid Ansari,et al.  Learning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning , 2017, IEEE Transactions on Affective Computing.

[10]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[12]  Gregory D. Hager,et al.  Regularizing face verification nets for pain intensity regression , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13]  Marian Stewart Bartlett,et al.  Classification and weakly supervised pain localization using multiple segment representation , 2014, Image Vis. Comput..

[14]  Abdenour Hadid,et al.  A Spatiotemporal Convolutional Neural Network for Automatic Pain Intensity Estimation from Facial Dynamics , 2019, International Journal of Computer Vision.

[15]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[16]  Abdenour Hadid,et al.  A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics , 2022, IEEE Transactions on Affective Computing.

[17]  M Lynch,et al.  Pain as the fifth vital sign. , 2001, Journal of intravenous nursing : the official publication of the Intravenous Nurses Society.

[18]  Kamal Nasrollahi,et al.  Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification , 2017, IEEE Transactions on Cybernetics.

[19]  Qiang Ji,et al.  Bilateral Ordinal Relevance Multi-instance Regression for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Nicu Sebe,et al.  We are not All Equal: Personalizing Models for Facial Expression Analysis with Transductive Parameter Transfer , 2014, ACM Multimedia.

[21]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[22]  Xiangjun Wang,et al.  Unsupervised Domain Adaptation for Facial Expression Recognition Using Generative Adversarial Networks , 2018, Comput. Intell. Neurosci..

[23]  Abdenour Hadid,et al.  Depression Detection Based on Deep Distribution Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[24]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qijun Zhao,et al.  Discriminative Feature Adaptation for cross-domain facial expression recognition , 2016, 2016 International Conference on Biometrics (ICB).

[26]  Qiang Ji,et al.  Weakly-Supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yung-Yu Chuang,et al.  Augmented Multiple Instance Regression for Inferring Object Contours in Bounding Boxes , 2014, IEEE Transactions on Image Processing.

[28]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[29]  Maja Pantic,et al.  Multi-Instance Dynamic Ordinal Random Fields for Weakly Supervised Facial Behavior Analysis , 2018, IEEE Transactions on Image Processing.

[30]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[31]  Guoying Zhao,et al.  Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).