BioFace-3D: continuous 3d facial reconstruction through lightweight single-ear biosensors

Over the last decade, facial landmark tracking and 3D reconstruction have gained considerable attention due to their numerous applications such as human-computer interactions, facial expression analysis, and emotion recognition, etc. Traditional approaches require users to be confined to a particular location and face a camera under constrained recording conditions (e.g., without occlusions and under good lighting conditions). This highly restricted setting prevents them from being deployed in many application scenarios involving human motions. In this paper, we propose the first single-earpiece lightweight biosensing system, BioFace-3D, that can unobtrusively, continuously, and reliably sense the entire facial movements, track 2D facial landmarks, and further render 3D facial animations. Our single-earpiece biosensing system takes advantage of the cross-modal transfer learning model to transfer the knowledge embodied in a high-grade visual facial landmark detection model to the low-grade biosignal domain. After training, our BioFace-3D can directly perform continuous 3D facial reconstruction from the biosignals, without any visual input. Without requiring a camera positioned in front of the user, this paradigm shift from visual sensing to biosensing would introduce new opportunities in many emerging mobile and IoT applications. Extensive experiments involving 16 participants under various settings demonstrate that BioFace-3D can accurately track 53 major facial landmarks with only 1.85 mm average error and 3.38% normalized mean error, which is comparable with most state-of-the-art camera-based solutions. The rendered 3D facial animations, which are in consistency with the real human facial movements, also validate the system's capability in continuous 3D facial reconstruction.

[1]  P. Ekman Universal facial expressions of emotion. , 1970 .

[2]  J. Webster Reducing Motion Artifacts and Interference in Biopotential Recording , 1984, IEEE Transactions on Biomedical Engineering.

[3]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[4]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[5]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[6]  Rosalind W. Picard,et al.  Expression glasses: a wearable device for facial expression recognition , 1999, CHI Extended Abstracts.

[7]  Jenq-Neng Hwang,et al.  Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System , 2001, J. VLSI Signal Process..

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Paul L. Rosin,et al.  Speech driven facial animation using a hidden Markov coarticulation model , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[12]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[13]  Huosheng Hu,et al.  EMG-based hands-free wheelchair control with EOG attention shift detection , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[14]  Lucas D. Terissi,et al.  Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation , 2008, SBIA.

[15]  Necmettin Sezgin,et al.  Estimation of Sleep Stages by an Artificial Neural Network Employing EEG, EMG and EOG , 2010, Journal of Medical Systems.

[16]  Marios Savvides,et al.  Automatic Facial Landmark Tracking in Video Sequences Using Kalman Filter Assisted Active Shape Models , 2010, ECCV Workshops.

[17]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[18]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[19]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[20]  Mohammad Firoozabadi,et al.  Facial gesture recognition using two-channel bio-sensors configuration and fuzzy classifier: A pilot study , 2011, International Conference on Electrical, Control and Computer Engineering 2011 (InECCE).

[21]  J Ali,et al.  Human facial neural activities and gesture recognition for machine-interfacing applications , 2011, International journal of nanomedicine.

[22]  Huosheng Hu,et al.  A novel human–machine interface based on recognition of multi-channel facial bioelectric signals , 2011, Australasian Physical & Engineering Sciences in Medicine.

[23]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[24]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Mahyar Hamedi,et al.  EMG-based facial gesture recognition through versatile elliptic basis function neural network , 2013, Biomedical engineering online.

[26]  Shahidan M. Abdullah,et al.  An overview of principal component analysis , 2013 .

[27]  Amit Konar,et al.  Classifying Electrooculogram to Detect Directional Eye Movements , 2013 .

[28]  Zankhana H. Shah,et al.  Facial Expression Recognition: A Survey , 2014 .

[29]  Andrzej Cichocki,et al.  GOM-Face: GKP, EOG, and EMG-Based Multimodal Interface With Application to Humanoid Robot Control , 2014, IEEE Transactions on Biomedical Engineering.

[30]  Maysam Ghovanloo,et al.  The tongue and ear interface: a wearable system for silent speech recognition , 2014, SEMWEB.

[31]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[32]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Chongyang Ma,et al.  Facial performance sensing head-mounted display , 2015, ACM Trans. Graph..

[35]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hanjiang Lai,et al.  Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[37]  Rogério Schmidt Feris,et al.  A Recurrent Encoder-Decoder Network for Sequential Face Alignment , 2016, ECCV.

[38]  Hai Xuan Pham,et al.  End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech , 2017, ArXiv.

[39]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[41]  Bodo Urban,et al.  EarFieldSensing: A Novel In-Ear Electric Field Sensing to Enrich Wearable Gesture Input through Facial Expressions , 2017, CHI.

[42]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[43]  Tal Hassner,et al.  Facial Landmark Detection with Tweaked Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Steffen Heber,et al.  Identification of Ribosome Pause Sites Using a Z-Score Based Peak Detection Algorithm , 2018, 2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[46]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Anh Nguyen,et al.  TYTH-Typing On Your Teeth: Tongue-Teeth Localization for Human-Computer Interface , 2018, MobiSys.

[48]  Chenliang Xu,et al.  Generating Talking Face Landmarks from Speech , 2018, LVA/ICA.

[49]  Phil D. Green,et al.  A Wearable Silent Speech Interface based on Magnetic Sensors with Motion-Artefact Removal , 2018, BIODEVICES.

[50]  Yici Cai,et al.  Look at Boundary: A Boundary-Aware Face Alignment Algorithm , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Hiroki Watanabe,et al.  Facial expression recognition using ear canal transfer function , 2019, UbiComp.

[52]  Jason Wu,et al.  TongueBoard: An Oral Interface for Subtle Input , 2019, AH.

[53]  Federico Alvarez,et al.  Three-D Wide Faces (3DWF): Facial Landmark Detection and 3D Reconstruction over a New RGB–D Multi-Camera Dataset , 2019, Sensors.

[54]  Jan Kautz,et al.  Few-Shot Adaptive Gaze Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Interferi , 2019, Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems.

[56]  Onur Avci,et al.  1D Convolutional Neural Networks and Applications: A Survey , 2019, Mechanical Systems and Signal Processing.

[57]  Jinwoo Shin,et al.  MetaSense: few-shot adaptation to untrained conditions in deep mobile sensing , 2019, SenSys.

[58]  Chris Harrison,et al.  Interferi: Gesture Sensing using On-Body Acoustic Interferometry , 2019, CHI.

[59]  Cheng Zhang,et al.  C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras , 2020, UIST.

[60]  Chenliang Xu,et al.  Noise-Resilient Training Method for Face Landmark Generation From Speech , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[61]  Yuta Sugiura,et al.  Face Commands - User-Defined Facial Gestures for Smart Glasses , 2020, 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[62]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.