C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras

C-Face (Contour-Face) is an ear-mounted wearable sensing technology that uses two miniature cameras to continuously reconstruct facial expressions by deep learning contours of the face. When facial muscles move, the contours of the face change from the point of view of the ear-mounted cameras. These subtle changes are fed into a deep learning model which continuously outputs 42 facial feature points representing the shapes and positions of the mouth, eyes and eyebrows. To evaluate C-Face, we embedded our technology into headphones and earphones. We conducted a user study with nine participants. In this study, we compared the output of our system to the feature points outputted by a state of the art computer vision library (Dlib) from a font facing camera. We found that the mean error of all 42 feature points was 0.77 mm for earphones and 0.74 mm for headphones. The mean error for 20 major feature points capturing the most active areas of the face was 1.43 mm for earphones and 1.39 mm for headphones. The ability to continuously reconstruct facial expressions introduces new opportunities in a variety of applications. As a demonstration, we implemented and evaluated C-Face for two applications: facial expression detection (outputting emojis) and silent speech recognition. We further discuss the opportunities and challenges of deploying C-Face in real-world applications.

[1]  Yuhan Hu,et al.  ShadowSense , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[2]  Cheng Zhang,et al.  FingerTrak , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[3]  Tamás Gábor Csapó,et al.  Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks , 2019, Acta Acustica united with Acustica.

[4]  Chris Harrison,et al.  Interferi: Gesture Sensing using On-Body Acoustic Interferometry , 2019, CHI.

[5]  Jun Rekimoto,et al.  SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks , 2019, CHI.

[6]  Asifullah Khan,et al.  A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yuanchun Shi,et al.  Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands , 2018, UIST.

[10]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Lijun Yin,et al.  Identity-Adaptive Facial Expression Recognition through Expression Regeneration Using Conditional Generative Adversarial Networks , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[12]  Shang-Hong Lai,et al.  Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-View Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[13]  Gregory D. Abowd,et al.  FingerPing: Recognizing Fine-grained Hand Poses using Active Acoustic On-body Sensing , 2018, CHI.

[14]  Huy Phan,et al.  Weighted and Multi-Task Loss for Rare Audio Event Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Maja Pantic,et al.  Visual-Only Recognition of Normal, Whispered and Silent Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Buntarou Shizuki,et al.  CanalSense: Face-Related Movement Recognition System based on Sensing Air Pressure in Ear Canals , 2017, UIST.

[17]  Huy Phan,et al.  DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection , 2017, ArXiv.

[18]  Bodo Urban,et al.  EarFieldSensing: A Novel In-Ear Electric Field Sensing to Enrich Wearable Gesture Input through Facial Expressions , 2017, CHI.

[19]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Yaser Sheikh,et al.  Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[22]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Anestis Terzis,et al.  Handbook of Camera Monitor Systems: The Automotive Mirror-Replacement Technology based on ISO 16505 , 2016 .

[24]  Kai Kunze,et al.  Facial Expression Recognition in Daily Life by Embedded Photo Reflective Sensors on Smart Eyewear , 2016, IUI.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Chuan Li,et al.  Approximate Translational Building Blocks for Image Decomposition and Synthesis , 2015, ACM Trans. Graph..

[27]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[28]  Chongyang Ma,et al.  Facial performance sensing head-mounted display , 2015, ACM Trans. Graph..

[29]  Christian Szegedy,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Ying Chen,et al.  Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild , 2014, ICMI.

[31]  Maysam Ghovanloo,et al.  The tongue and ear interface: a wearable system for silent speech recognition , 2014, SEMWEB.

[32]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Aaron C. Courville,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[34]  Qiang Ji,et al.  Facial Expression Recognition Using Deep Boltzmann Machine from Thermal Infrared Images , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[35]  Paramvir Bahl,et al.  Energy characterization and optimization of image sensing toward continuous mobile vision , 2013, MobiSys '13.

[36]  Jarmo Verho,et al.  Capacitive Measurement of Facial Activity Intensity , 2013, IEEE Sensors Journal.

[37]  Mohamed Chetouani,et al.  Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[38]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[39]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[40]  Markus Flierl,et al.  Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Anna Gruebler,et al.  Measurement of distal EMG signals using a wearable device for reading facial expressions , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[42]  Tanja Schultz,et al.  ICCHP Keynote: Recognizing Silent and Weak Speech Based on Electromyography , 2010, ICCHP.

[43]  Minghua Zhao,et al.  Skin Color Segmentation Based on Improved 2D Otsu and YCgCr , 2010, 2010 International Conference on Electrical and Control Engineering.

[44]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[45]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[46]  Gérard Chollet,et al.  Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips , 2010, Speech Commun..

[47]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[48]  Marian Stewart Bartlett,et al.  Automatic facial expression recognition for intelligent tutoring systems , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Subramanian Ramanathan,et al.  Human Facial Expression Recognition using a 3D Morphable Model , 2006, 2006 International Conference on Image Processing.

[50]  Bruce Denby,et al.  Prospects for a Silent Speech Interface using Ultrasound Imaging , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[51]  Rosalind W. Picard,et al.  Expression glasses: a wearable device for facial expression recognition , 1999, CHI Extended Abstracts.

[52]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[53]  Keith Waters,et al.  Computer Facial Animation, Second Edition , 1996 .

[54]  Phil D. Green,et al.  A Wearable Silent Speech Interface based on Magnetic Sensors with Motion-Artefact Removal , 2018, BIODEVICES.

[55]  Anestis Terzis,et al.  Handbook of Camera Monitor Systems , 2016 .

[56]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[57]  Phil D. Green,et al.  Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing , 2013, Speech Commun..

[58]  Jarmo Verho,et al.  Capacitive facial movement detection for human–computer interaction to click by frowning and lifting eyebrows , 2009, Medical & Biological Engineering & Computing.

[59]  Ioannis Pitas,et al.  Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[60]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[61]  K. Waters,et al.  Computer facial animation , 1996 .

[62]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .