DIF : Dataset of Perceived Intoxicated Faces for Drunk Person Identification

Traffic accidents cause over a million deaths every year, of which a large fraction is attributed to drunk driving. An automated intoxicated driver detection system in vehicles will be useful in reducing accidents and related financial costs. Existing solutions require special equipment such as electrocardiogram, infrared cameras or breathalyzers. In this work, we propose a new dataset called DIF (Dataset of perceived Intoxicated Faces) which contains audio-visual data of intoxicated and sober people obtained from online sources. To the best of our knowledge, this is the first work for automatic bimodal non-invasive intoxication detection. Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) are trained for computing the video and audio baselines, respectively. 3D CNN is used to exploit the Spatio-temporal changes in the video. A simple variation of the traditional 3D convolution block is proposed based on inducing non-linearity between the spatial and temporal channels. Extensive experiments are performed to validate the approach and baselines.

[1]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[2]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[3]  Kim Fung Tsang,et al.  A Precise Drunk Driving Detection Using Weighted Kernel Based on Electrocardiogram , 2016, Sensors.

[4]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[5]  Global Road Safety Partnership Helmets : a road safety manual for decision-makers and practitioners , 2006 .

[6]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[7]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[10]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  S. Lautenbacher,et al.  Acute alcohol effects on facial expressions of emotions in social drinkers: a systematic review , 2017, Psychology research and behavior management.

[13]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[14]  Florian Schiel,et al.  ALC: Alcohol Language Corpus , 2008, LREC.

[15]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  C. Ideström,et al.  Time relations of the effects of alcohol compared to placebo , 1968, Psychopharmacologia.

[18]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19]  P. Ekman,et al.  Facial action coding system , 2019 .

[20]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[21]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[22]  N. A. Santos,et al.  Effects of acute alcohol ingestion on eye movements and cognition: A double-blind, placebo-controlled study , 2017, PloS one.

[23]  Vassilis Anastassopoulos,et al.  Drunk person identification using thermal infrared images , 2009, 2009 16th International Conference on Digital Signal Processing.

[24]  Pushpak Bhattacharyya,et al.  A Computational Approach to Automatic Prediction of Drunk-Texting , 2015, ACL.

[25]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[26]  Robert Klein,et al.  Drinking and driving: a road safety manual for decision-makers and practitioners. , 2007 .

[27]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Anil Kumar,et al.  Understanding Psycholinguistic Behavior of Predominant Drunk Texters in Social Media , 2018, 2018 IEEE Symposium on Computers and Communications (ISCC).