论文信息 - C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras

C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-mounted Miniature Cameras

C-Face (Contour-Face) is an ear-mounted wearable sensing technology that uses two miniature cameras to continuously reconstruct facial expressions by deep learning contours of the face. When facial muscles move, the contours of the face change from the point of view of the ear-mounted cameras. These subtle changes are fed into a deep learning model which continuously outputs 42 facial feature points representing the shapes and positions of the mouth, eyes and eyebrows. To evaluate C-Face, we embedded our technology into headphones and earphones. We conducted a user study with nine participants. In this study, we compared the output of our system to the feature points outputted by a state of the art computer vision library (Dlib) from a font facing camera. We found that the mean error of all 42 feature points was 0.77 mm for earphones and 0.74 mm for headphones. The mean error for 20 major feature points capturing the most active areas of the face was 1.43 mm for earphones and 1.39 mm for headphones. The ability to continuously reconstruct facial expressions introduces new opportunities in a variety of applications. As a demonstration, we implemented and evaluated C-Face for two applications: facial expression detection (outputting emojis) and silent speech recognition. We further discuss the opportunities and challenges of deploying C-Face in real-world applications.

[1] Tamás Gábor Csapó,et al. Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks , 2019, Acta Acustica united with Acustica.

[2] Anestis Terzis,et al. Handbook of Camera Monitor Systems , 2016 .

[3] Georgios Tzimiropoulos,et al. Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Sergio Escalera,et al. Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[6] Ying Chen,et al. Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild , 2014, ICMI.

[7] Kai Kunze,et al. Facial Expression Recognition in Daily Life by Embedded Photo Reflective Sensors on Smart Eyewear , 2016, IUI.

[8] Tanja Schultz,et al. ICCHP Keynote: Recognizing Silent and Weak Speech Based on Electromyography , 2010, ICCHP.

[9] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[10] Yuanchun Shi,et al. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands , 2018, UIST.

[11] Bodo Urban,et al. EarFieldSensing: A Novel In-Ear Electric Field Sensing to Enrich Wearable Gesture Input through Facial Expressions , 2017, CHI.

[12] Anestis Terzis,et al. Handbook of Camera Monitor Systems: The Automotive Mirror-Replacement Technology based on ISO 16505 , 2016 .

[13] Razvan Pascanu,et al. Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[14] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Jarmo Verho,et al. Capacitive facial movement detection for human–computer interaction to click by frowning and lifting eyebrows , 2009, Medical & Biological Engineering & Computing.

[16] Dongmei Jiang,et al. Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[19] Paramvir Bahl,et al. Energy characterization and optimization of image sensing toward continuous mobile vision , 2013, MobiSys '13.

[20] Gérard Chollet,et al. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips , 2010, Speech Commun..

[21] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[22] Subramanian Ramanathan,et al. Human Facial Expression Recognition using a 3D Morphable Model , 2006, 2006 International Conference on Image Processing.

[23] Keith Waters,et al. Computer facial animation , 1996 .

[24] Markus Flierl,et al. Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25] Minghua Zhao,et al. Skin Color Segmentation Based on Improved 2D Otsu and YCgCr , 2010, 2010 International Conference on Electrical and Control Engineering.

[26] Lijun Yin,et al. Identity-Adaptive Facial Expression Recognition through Expression Regeneration Using Conditional Generative Adversarial Networks , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[27] Chongyang Ma,et al. Facial performance sensing head-mounted display , 2015, ACM Trans. Graph..

[28] Phil D. Green,et al. A Wearable Silent Speech Interface based on Magnetic Sensors with Motion-Artefact Removal , 2018, BIODEVICES.

[29] Qiang Ji,et al. Facial Expression Recognition Using Deep Boltzmann Machine from Thermal Infrared Images , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[30] Tanja Schultz,et al. Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[31] Pascal Vincent,et al. Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[33] Gregory D. Abowd,et al. FingerPing: Recognizing Fine-grained Hand Poses using Active Acoustic On-body Sensing , 2018, CHI.

[34] Jarmo Verho,et al. Capacitive Measurement of Facial Activity Intensity , 2013, IEEE Sensors Journal.

[35] Huy Phan,et al. DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection , 2017, ArXiv.

[36] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37] Huy Phan,et al. Weighted and Multi-Task Loss for Rare Audio Event Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38] Bruce Denby,et al. Prospects for a Silent Speech Interface using Ultrasound Imaging , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[39] Asifullah Khan,et al. A survey of the recent architectures of deep convolutional neural networks , 2019, Artificial Intelligence Review.

[40] Keith Waters,et al. Computer Facial Animation, Second Edition , 1996 .

[41] Maysam Ghovanloo,et al. The tongue and ear interface: a wearable system for silent speech recognition , 2014, SEMWEB.

[42] Chuan Li,et al. Approximate Translational Building Blocks for Image Decomposition and Synthesis , 2015, ACM Trans. Graph..

[43] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[44] Changsheng Xu,et al. Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Cheng Zhang,et al. FingerTrak , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[46] Shang-Hong Lai,et al. Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-View Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[47] Jun Rekimoto,et al. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks , 2019, CHI.

[48] Ping Liu,et al. Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Marian Stewart Bartlett,et al. Automatic facial expression recognition for intelligent tutoring systems , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[50] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Loïc Kessous,et al. Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[52] J. M. Gilbert,et al. Silent speech interfaces , 2010, Speech Commun..

[53] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Mohamed Chetouani,et al. Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[55] Maja Pantic,et al. Visual-Only Recognition of Normal, Whispered and Silent Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[57] Buntarou Shizuki,et al. CanalSense: Face-Related Movement Recognition System based on Sensing Air Pressure in Ear Canals , 2017, UIST.

[58] Phil D. Green,et al. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing , 2013, Speech Commun..

[59] Geoffrey E. Hinton,et al. On deep generative models with applications to recognition , 2011, CVPR 2011.

[60] Rosalind W. Picard,et al. Expression glasses: a wearable device for facial expression recognition , 1999, CHI Extended Abstracts.

[61] Chris Harrison,et al. Interferi: Gesture Sensing using On-Body Acoustic Interferometry , 2019, CHI.

[62] Ioannis Pitas,et al. Facial Expression Recognition in Image Sequences Using Geometric Deformation Features and Support Vector Machines , 2007, IEEE Transactions on Image Processing.

[63] Anna Gruebler,et al. Measurement of distal EMG signals using a wearable device for reading facial expressions , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.