EMO: real-time emotion recognition from single-eye images for resource-constrained eyewear devices

Real-time user emotion recognition is highly desirable for many applications on eyewear devices like smart glasses. However, it is very challenging to enable this capability on such devices due to tightly constrained image contents (only eye-area images available from the on-device eye-tracking camera) and computing resources of the embedded system. In this paper, we propose and develop a novel system called EMO that can recognize, on top of a resource-limited eyewear device, real-time emotions of the user who wears it. Unlike most existing solutions that require whole-face images to recognize emotions, EMO only utilizes the single-eye-area images captured by the eye-tracking camera of the eyewear. To achieve this, we design a customized deep-learning network to effectively extract emotional features from input single-eye images and a personalized feature classifier to accurately identify a user's emotions. EMO also exploits the temporal locality and feature similarity among consecutive video frames of the eye-tracking camera to further reduce the recognition latency and system resource usage. We implement EMO on two hardware platforms and conduct comprehensive experimental evaluations. Our results demonstrate that EMO can continuously recognize seven-type emotions at 12.8 frames per second with a mean accuracy of 72.2%, significantly outperforming the state-of-the-art approach, and consume much fewer system resources.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Tsutomu Terada,et al.  A smile/laughter recognition mechanism for smile-based life logging , 2013, AH.

[4]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Gang Wang,et al.  A Siamese Long Short-Term Memory Architecture for Human Re-identification , 2016, ECCV.

[7]  Edward J. Delp,et al.  A Two Stream Siamese Convolutional Neural Network for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[9]  P. Ekman,et al.  EMFACS-7: Emotional Facial Action Coding System , 1983 .

[10]  Omprakash Gnawali,et al.  Person-of-interest detection system using cloud-supported computerized-eyewear , 2013, 2013 IEEE International Conference on Technologies for Homeland Security (HST).

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dmitry B. Goldgof,et al.  Macro- and micro-expression spotting in long videos using spatio-temporal strain , 2011, Face and Gesture 2011.

[13]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[14]  Guillaume Chanel,et al.  Emotion Assessment: Arousal Evaluation Using EEG's and Peripheral Physiological Signals , 2006, MRCS.

[15]  Rakesh Kumar,et al.  VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.

[16]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[17]  Vinod Chandran,et al.  Facial Expression Analysis under Partial Occlusion , 2018, ACM Comput. Surv..

[18]  John Paulin Hansen,et al.  Evaluation of a low-cost open-source gaze tracker , 2010, ETRA.

[19]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Fadel Adib,et al.  Emotion recognition using wireless signals , 2016, MobiCom.

[21]  Kai Kunze,et al.  AffectiveWear: towards recognizing affect in real life , 2015, UbiComp/ISWC Adjunct.

[22]  J. Eccles The emotional brain. , 1980, Bulletin et memoires de l'Academie royale de medecine de Belgique.

[23]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Yunxin Liu,et al.  MoodScope: building a mood sensor from smartphone usage patterns , 2013, MobiSys.

[25]  Lisa Aziz-Zadeh,et al.  Embodied semantics for actions: Findings from functional brain imaging , 2008, Journal of Physiology-Paris.

[26]  Daniel McDuff,et al.  AFFDEX SDK: A Cross-Platform Real-Time Multi-Face Expression Recognition Toolkit , 2016, CHI Extended Abstracts.

[27]  Thiago Santini,et al.  ElSe: ellipse selection for robust pupil detection in real-world environments , 2015, ETRA.

[28]  Andrew T. Campbell,et al.  Visage: A Face Interpretation Engine for Smartphone Applications , 2012, MobiCASE.

[29]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[30]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Anastasios Delopoulos,et al.  The MUG facial expression database , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[33]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[34]  Kostas Karpouzis,et al.  Emotion recognition through facial expression analysis based on a neurofuzzy network , 2005, Neural Networks.

[35]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  P. Ekman An argument for basic emotions , 1992 .

[37]  Andreas Bulling,et al.  Recognition of curiosity using eye movement analysis , 2015, UbiComp/ISWC Adjunct.

[38]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[39]  Rosalind W. Picard,et al.  Expression glasses: a wearable device for facial expression recognition , 1999, CHI Extended Abstracts.

[40]  Del Moral HernandezEmilio 2005 Special Issue , 2005 .

[41]  Takeshi Saitoh,et al.  CNN-Based Pupil Center Detection for Wearable Gaze Estimation System , 2017, Appl. Comput. Intell. Soft Comput..

[42]  Sébastien Ouellet,et al.  Real-time emotion recognition for gaming using deep convolutional network features , 2014, ArXiv.

[43]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[44]  Irfan A. Essa,et al.  Eyemotion: Classifying Facial Expressions in VR Using Eye-Tracking Cameras , 2017, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Alexander Travis Adams,et al.  EmotionCheck: leveraging bodily signals and false feedback to regulate our emotions , 2016, UbiComp.

[46]  Antonio Krüger,et al.  "The story of life is quicker than the blink of an eye": using corneal imaging for life logging , 2016, UbiComp Adjunct.

[47]  Shiguang Shan,et al.  AU-aware Deep Networks for facial expression recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[48]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[49]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[50]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[51]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[52]  Andreas Bulling,et al.  Discovery of everyday human activities from long-term visual behaviour using topic models , 2015, UbiComp.

[53]  P. Ekman,et al.  Measuring facial movement , 1976 .

[54]  Daniel N. McIntosh,et al.  Facial Movement, Breathing, Temperature, and Affect: Implications of the Vascular Theory of Emotional Efference , 1997 .

[55]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[56]  Aude Billard,et al.  A wearable gaze tracking system for children in unconstrained environments , 2011, Comput. Vis. Image Underst..

[57]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[58]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[59]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[60]  A. Schaefer,et al.  Please Scroll down for Article Cognition & Emotion Assessing the Effectiveness of a Large Database of Emotion-eliciting Films: a New Tool for Emotion Researchers , 2022 .

[61]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[62]  Sung-Ju Lee,et al.  Intelligent positive computing with mobile, wearable, and IoT devices: Literature review and research directions , 2019, Ad Hoc Networks.

[63]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[64]  Jiankun Hu,et al.  Continuous Authentication Using Eye Movement Response of Implicit Visual Stimuli , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[65]  Mi Zhang,et al.  When Virtual Reality Meets Internet of Things in the Gym , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[66]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[67]  Mustafa Unel,et al.  Facial Expression Based Emotion Recognition Using Neural Networks , 2018, ICIAR.

[68]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Quan Wang,et al.  Development of an untethered, mobile, low-cost head-mounted eye tracker , 2014, ETRA.

[70]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[71]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[72]  Gábor Sörös,et al.  Wearable eye tracker calibration at your fingertips , 2018, ETRA.

[73]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Antonio Krüger,et al.  GazeProjector: Accurate Gaze Estimation and Seamless Gaze Interaction Across Multiple Displays , 2015, UIST.

[75]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.