BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States

In affective computing applications, access to labeled spontaneous affective data is essential for testing the designed algorithms under naturalistic and challenging conditions. Most databases available today are acted or do not contain audio data. We present a spontaneous audio-visual affective face database of affective and mental states. The video clips in the database are obtained by recording the subjects from the frontal view using a stereo camera and from the half-profile view using a mono camera. The subjects are first shown a sequence of images and short video clips, which are not only meticulously fashioned but also timed to evoke a set of emotions and mental states. Then, they express their ideas and feelings about the images and video clips they have watched in an unscripted and unguided way in Turkish. The target emotions, include the six basic ones (happiness, anger, sadness, disgust, fear, surprise) as well as boredom and contempt. We also target several mental states, which are unsure (including confused, undecided), thinking, concentrating, and bothered. Baseline experimental results on the BAUM-1 database show that recognition of affective and mental states under naturalistic conditions is quite challenging. The database is expected to enable further research on audio-visual affect and mental state recognition under close-to-real scenarios.

[1]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[2]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[3]  K. Scherer,et al.  Introducing the Geneva Multimodal Emotion Portrayal (GEMEP) corpus , 2010 .

[4]  J. N. Bassili Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. , 1979, Journal of personality and social psychology.

[5]  P. Lang International Affective Picture System (IAPS) : Technical Manual and Affective Ratings , 1995 .

[6]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Simon Lucey,et al.  Automated Facial Expression Recognition System , 2009, 43rd Annual 2009 International Carnahan Conference on Security Technology.

[9]  Cigdem Eroglu Erdem,et al.  Multimodal emotion recognition based on peak frame selection from video , 2015, Signal, Image and Video Processing.

[10]  Gwen Littlewort,et al.  Automatic coding of facial expressions displayed during posed and genuine pain , 2009, Image Vis. Comput..

[11]  Zafer Aydin,et al.  BAUM-2: a multilingual audio-visual affective face database , 2014, Multimedia Tools and Applications.

[12]  Ling Guan,et al.  Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[13]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[14]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[15]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[17]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[18]  F. Hesse,et al.  Relative effectiveness and validity of mood induction procedures : a meta-analysis , 1996 .

[19]  F. Wallhoff,et al.  The Facial Expressions and Emotions Database Homepage (FEEDTUM) , 2005 .

[20]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[21]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[24]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Alexei A. Efros,et al.  Mirror mirror , 2014, ACM Trans. Graph..

[26]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[27]  Zafer Aydin,et al.  A method for extraction of affective audio-visual facial clips from movies , 2013, 2013 21st Signal Processing and Communications Applications Conference (SIU).

[28]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[29]  Çigdem Eroglu Erdem,et al.  Gaussian mixture model based estimation of the neutral face shape for emotion recognition , 2014, Digit. Signal Process..

[30]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[31]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  D. Heylen,et al.  Issues in Data Labelling , 2011 .

[34]  A. Tanju Erdem,et al.  Combining Haar Feature and skin color based classifiers for face detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[36]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Daniel McDuff,et al.  Crowdsourcing Facial Responses to Online Videos , 2012, IEEE Transactions on Affective Computing.

[38]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[39]  Alice Caplier,et al.  Face Recognition with Patterns of Oriented Edge Magnitudes , 2010, ECCV.

[40]  el Kaliouby,et al.  Mind-reading machines: automated inference of complex mental states , 2005 .

[41]  Wade Junek,et al.  Mind Reading: The Interactive Guide to Emotions , 2007 .

[42]  J. Gross,et al.  Emotion elicitation using films , 1995 .

[43]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[44]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[45]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Nicu Sebe,et al.  Multimodal approaches for emotion recognition: a survey , 2005, IS&T/SPIE Electronic Imaging.

[48]  Michael J. Lyons,et al.  Coding facial expressions with Gabor wavelets , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[49]  Tsuhan Chen,et al.  The painful face - Pain expression recognition using active appearance models , 2009, Image Vis. Comput..

[50]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[51]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[52]  Luc Van Gool,et al.  A 3-D Audio-Visual Corpus of Affective Communication , 2010, IEEE Transactions on Multimedia.

[53]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[54]  M. Watkins,et al.  Interobserver Agreement in Behavioral Research: Importance and Calculation , 2000 .

[55]  Cigdem Eroglu Erdem,et al.  Multimodal emotion recognition with automatic peak frame selection , 2014, 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings.

[56]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Tamás D. Gedeon,et al.  Emotion recognition using PHOG and LPQ features , 2011, Face and Gesture 2011.