EMO: real-time emotion recognition from single-eye images for resource-constrained eyewear devices

Real-time user emotion recognition is highly desirable for many applications on eyewear devices like smart glasses. However, it is very challenging to enable this capability on such devices due to tightly constrained image contents (only eye-area images available from the on-device eye-tracking camera) and computing resources of the embedded system. In this paper, we propose and develop a novel system called EMO that can recognize, on top of a resource-limited eyewear device, real-time emotions of the user who wears it. Unlike most existing solutions that require whole-face images to recognize emotions, EMO only utilizes the single-eye-area images captured by the eye-tracking camera of the eyewear. To achieve this, we design a customized deep-learning network to effectively extract emotional features from input single-eye images and a personalized feature classifier to accurately identify a user's emotions. EMO also exploits the temporal locality and feature similarity among consecutive video frames of the eye-tracking camera to further reduce the recognition latency and system resource usage. We implement EMO on two hardware platforms and conduct comprehensive experimental evaluations. Our results demonstrate that EMO can continuously recognize seven-type emotions at 12.8 frames per second with a mean accuracy of 72.2%, significantly outperforming the state-of-the-art approach, and consume much fewer system resources.

[1]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[2]  Sung-Ju Lee,et al.  Intelligent positive computing with mobile, wearable, and IoT devices: Literature review and research directions , 2019, Ad Hoc Networks.

[3]  Rakesh Kumar,et al.  VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.

[4]  Mi Zhang,et al.  When Virtual Reality Meets Internet of Things in the Gym , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[5]  Mustafa Unel,et al.  Facial Expression Based Emotion Recognition Using Neural Networks , 2018, ICIAR.

[6]  Gábor Sörös,et al.  Wearable eye tracker calibration at your fingertips , 2018, ETRA.

[7]  Vinod Chandran,et al.  Facial Expression Analysis under Partial Occlusion , 2018, ACM Comput. Surv..

[8]  Jiankun Hu,et al.  Continuous Authentication Using Eye Movement Response of Implicit Visual Stimuli , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[9]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[10]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Edward J. Delp,et al.  A Two Stream Siamese Convolutional Neural Network for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Irfan A. Essa,et al.  Eyemotion: Classifying Facial Expressions in VR Using Eye-Tracking Cameras , 2017, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Fadel Adib,et al.  Emotion recognition using wireless signals , 2016, MobiCom.

[16]  Antonio Krüger,et al.  "The story of life is quicker than the blink of an eye": using corneal imaging for life logging , 2016, UbiComp Adjunct.

[17]  Malte F. Jung,et al.  EmotionCheck: leveraging bodily signals and false feedback to regulate our emotions , 2016, UbiComp.

[18]  Ignatius McGovern,et al.  Pupil , 2016, The Medical journal of Australia.

[19]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[20]  Gang Wang,et al.  A Siamese Long Short-Term Memory Architecture for Human Re-identification , 2016, ECCV.

[21]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[22]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andy Davis,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24]  A. Smeulders,et al.  Siamese Instance Search for Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Daniel McDuff,et al.  AFFDEX SDK: A Cross-Platform Real-Time Multi-Face Expression Recognition Toolkit , 2016, CHI Extended Abstracts.

[26]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thiago Santini,et al.  ElSe: ellipse selection for robust pupil detection in real-world environments , 2015, ETRA.

[31]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[32]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[33]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[34]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[35]  Antonio Krüger,et al.  GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays , 2015, UIST.

[36]  Kai Kunze,et al.  AffectiveWear: towards recognizing affect in real life , 2015, UbiComp/ISWC Adjunct.

[37]  Andreas Bulling,et al.  Discovery of everyday human activities from long-term visual behaviour using topic models , 2015, UbiComp.

[38]  Andreas Bulling,et al.  Recognition of curiosity using eye movement analysis , 2015, UbiComp/ISWC Adjunct.

[39]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Sébastien Ouellet,et al.  Real-time emotion recognition for gaming using deep convolutional network features , 2014, ArXiv.

[42]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[44]  Quan Wang,et al.  Development of an untethered, mobile, low-cost head-mounted eye tracker , 2014, ETRA.

[45]  Omprakash Gnawali,et al.  Person-of-interest detection system using cloud-supported computerized-eyewear , 2013, 2013 IEEE International Conference on Technologies for Homeland Security (HST).

[46]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[47]  N. Lane,et al.  MoodScope: building a mood sensor from smartphone usage patterns , 2013, MobiSys '13.

[48]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[50]  Tsutomu Terada,et al.  A smile/laughter recognition mechanism for smile-based life logging , 2013, AH.

[51]  Andrew T. Campbell,et al.  Visage: A Face Interpretation Engine for Smartphone Applications , 2012, MobiCASE.

[52]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[53]  Aude Billard,et al.  A wearable gaze tracking system for children in unconstrained environments , 2011, Comput. Vis. Image Underst..

[54]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[55]  Dmitry B. Goldgof,et al.  Macro- and micro-expression spotting in long videos using spatio-temporal strain , 2011, Face and Gesture 2011.

[56]  A. Schaefer,et al.  Assessing the effectiveness of a large database of emotion-eliciting films: A new tool for emotion researchers , 2010 .

[57]  Anastasios Delopoulos,et al.  The MUG facial expression database , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[58]  John Paulin Hansen,et al.  Evaluation of a low-cost open-source gaze tracker , 2010, ETRA.

[59]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[61]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[62]  Lisa Aziz-Zadeh,et al.  Embodied semantics for actions: Findings from functional brain imaging , 2008, Journal of Physiology-Paris.

[63]  Guillaume Chanel,et al.  Emotion Assessment: Arousal Evaluation Using EEG's and Peripheral Physiological Signals , 2006, MRCS.

[64]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  S. Kollias,et al.  2005 Special Issue: Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005 .

[66]  Rosalind W. Picard,et al.  Expression glasses: a wearable device for facial expression recognition , 1999, CHI Extended Abstracts.

[67]  Daniel N. McIntosh,et al.  Facial Movement, Breathing, Temperature, and Affect: Implications of the Vascular Theory of Emotional Efference , 1997 .

[68]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[69]  P. Ekman An argument for basic emotions , 1992 .

[70]  P. Ekman,et al.  Measuring facial movement , 1976 .

[71]  Takeshi Saitoh,et al.  CNN-Based Pupil Center Detection for Wearable Gaze Estimation System , 2017, Appl. Comput. Intell. Soft Comput..

[72]  Michael E. Hasselmo,et al.  2005 Special Issue , 2005 .

[73]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[74]  P. Ekman,et al.  EMFACS-7: Emotional Facial Action Coding System , 1983 .

[75]  J. Eccles The emotional brain. , 1980, Bulletin et memoires de l'Academie royale de medecine de Belgique.

[76]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .