Deep learning with time-frequency representation for pulse estimation from facial videos

Accurate pulse estimation is of pivotal importance in acquiring the critical physical conditions of human subjects under test, and facial video based pulse estimation approaches recently gained attention owing to their simplicity. In this work, we have endeavored to develop a novel deep learning approach as the core part for pulse (heart rate) estimation by using a common RGB camera. Our approach consists of four steps. We first begin by detecting the face and its landmarks, and thereby locate the required facial ROI. In Step 2, we extract the sample mean sequences of the R, G, and B channels from the facial ROI, and explore three processing schemes for noise removal and signal enhancement. In Step 3, the Short-Time Fourier Transform (STFT) is employed to build the 2D Time-Frequency Representations (TFRs) of the sequences. The 2D TFR enables the formulation of the pulse estimation as an image-based classification problem, which can be solved in Step 4 by a deep Con-volutional Neural Network (CNN). Our approach is one of the pioneering works for attempting real-time pulse estimation using a deep learning framework. We have developed a pulse database, called the Pulse from Face (PFF), and used it to train the CNN. The PFF database will be made publicly available to advance related research. When compared to state-of-the-art pulse estimation approaches on the standard MAHNOB-HCI database, the proposed approach has exhibited superior performance.

[1]  L. O. Svaasand,et al.  Remote plethysmographic imaging using ambient light. , 2008, Optics express.

[2]  Danilo P. Mandic,et al.  Empirical Mode Decomposition-Based Time-Frequency Analysis of Multivariate Signals: The Power of Adaptive Data Analysis , 2013, IEEE Signal Processing Magazine.

[3]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Horst-Michael Groß,et al.  Non-contact video-based pulse rate measurement on a mobile service robot , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[6]  Nicu Sebe,et al.  Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Matti Pietikäinen,et al.  Remote Heart Rate Measurement from Face Videos under Realistic Situations , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Daniel McDuff,et al.  Advancements in Noncontact, Multiparameter Physiological Measurements Using a Webcam , 2011, IEEE Transactions on Biomedical Engineering.

[9]  Gerard de Haan,et al.  Robust Pulse Rate From Chrominance-Based rPPG , 2013, IEEE Transactions on Biomedical Engineering.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jin Jiang,et al.  Time-frequency feature representation using energy concentration: An overview of recent advances , 2009, Digit. Signal Process..

[14]  Rosalind W. Picard,et al.  Non-contact, automated cardiac pulse measurements using video imaging and blind source separation , 2022 .

[15]  Frédo Durand,et al.  Detecting Pulse from Head Motions in Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  S. Havlin,et al.  Detecting long-range correlations with detrended fluctuation analysis , 2001, cond-mat/0102214.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[19]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[20]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[21]  Hong Yan,et al.  A Machine Learning Approach to Improve Contactless Heart Rate Monitoring Using a Webcam , 2014, IEEE Journal of Biomedical and Health Informatics.

[22]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.