Beyond Deep Feature Averaging: Sampling Videos Towards Practical Facial Pain Recognition

In hospitals, automatic identification of patients with cameras can greatly generalize the applicability of intelligent patient monitoring. However, patients unaware of being monitored do not adjust their behaviors, making pose variation a challenge. We argue that the frame-wise feature mean is unable to characterize the variation among frames. We propose to preserve the overall pose diversity if we want the video feature to represent the subject identity. Then identity will be the only source of variation across videos since pose varies even within a single video. Following that variation disentanglement idea, we present a pose-robust face verification algorithm with each video represented as an ensemble of frame-wise CNN features. Another challenge is that patients may move anytime, which makes real-time processing of a video stream a necessity. Instead of simply using all the frames, the algorithm is highlighted at the key frame selection by pose quantization using pose distances to K-means centroids, which reduces the number of feature vectors from hundreds to K while still preserving the overall diversity. We analyze how such a video sampling strategy is better than random sampling. An end-to-end face recognition algorithm is developed for real-time patient identification with a rank-list of one-to-one similarities using the proposed video representation. It works well in practice and generates a private patient dataset on the fly. On the official 5000 video-pairs of public YouTube Face dataset, our algorithm achieves a comparable performance with state-of-the-art that averages over deep features of all frames. In summary, the main contribution of this paper is a videoversus-video consensus with discriminative metric learning on the fly, which is verified in a working system for the patient monitoring system. Figure 1: Painful expression can be subtle and short. Detection and measurement are difficult. Pain level is defined as AU4 + (AU6——AU7) + (AU9——AU10) + AU43 [18] [from the Prkachin and Solomon pain intensity (PSPI) metric].

[1]  Shiguang Shan,et al.  A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database , 2015, IEEE Transactions on Image Processing.

[2]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4]  Joshua Powell Pedestrian Detection with Convolutional Neural Networks , 2017 .

[5]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[7]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[8]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[10]  Peng Li,et al.  Similarity Metric Learning for Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Qiang Ji,et al.  Facial Expression Intensity Estimation Using Ordinal Information , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).