Improved Audio-Visual Laughter Detection Via Multi-Scale Multi-Resolution Image Texture Features and Classifier Fusion

Efforts are afoot to design better context-aware human-computer interaction techniques that have knowledge of both their surrounding and the affective state of the user. One of the most important nonverbal behavioural cues for affective human-machine interaction is laughter. Automatic detection of laughter is an interesting, yet challenging problem, which in recent years has gained increased attention from both the academic and industrial communities. The majority of existing laughter detection systems rely on either audio or video modalities. Humans, however, typically rely on audio-visual cues during conversation and/or interaction, thus it is expected that improved results can be achieved if both modalities are used. In this work, we propose a multimodal framework that analyzes audio and video channels separately, then fuses their decisions. Conventional speech spectral and prosodic features are used, whereas new multi -scale multiresolution binarized statistical image features are proposed due to their improved expressive power. Experiments with the publicly available MAHNOB Laughter database show that decision level fusion based on support vector machine classifiers leads to improved performance over single modality approaches, as well as over previously-proposed methods, all whilst requiring just a fraction of the computational power.

[1]  Maja Pantic,et al.  Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help , 2011, IEEE Transactions on Multimedia.

[2]  Maja Pantic,et al.  Prediction-based classification for audiovisual discrimination between laughter and speech , 2011, Face and Gesture 2011.

[3]  András Beke,et al.  Automatic Laughter Detection in Spontaneous Speech Using GMM-SVM Method , 2013, TSD.

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Alessandro Vinciarelli,et al.  Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[7]  William Curran,et al.  Perception and Automatic Recognition of Laughter from Whole-Body Motion: Continuous and Categorical Perspectives , 2015, IEEE Transactions on Affective Computing.

[8]  Tiago H. Falk,et al.  Laughter detection based on the fusion of local binary patterns, spectral and prosodic features , 2016, 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP).

[9]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Sergio Escalera,et al.  Multi-modal laughter recognition in video conversations , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Nick Campbell,et al.  No laughing matter , 2005, INTERSPEECH.

[12]  Maja Pantic,et al.  Audiovisual Detection of Laughter in Human-Machine Interaction , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[13]  Engin Erzin,et al.  Real-time audiovisual laughter detection , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Andrea Lockerd Thomaz,et al.  LAFCam: Leveraging affective feedback camcorder , 2002, CHI Extended Abstracts.

[16]  Akinori Ito,et al.  Smile and laughter recognition using speech processing and face recognition from conversation video , 2005, 2005 International Conference on Cyberworlds (CW'05).

[17]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[18]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[19]  Maja Pantic,et al.  Audiovisual laughter detection based on temporal features , 2008, ICMI '08.

[20]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Christoph Schnörr,et al.  Natural Image Statistics for Natural Image Segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Maja Pantic,et al.  The MAHNOB Laughter database , 2013, Image Vis. Comput..

[23]  Esa Rahtu,et al.  BSIF: Binarized statistical image features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[24]  Atsuo Takanishi,et al.  Quantitative Laughter Detection, Measurement, and Classification—A Critical Survey , 2016, IEEE Reviews in Biomedical Engineering.

[25]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[26]  Maja Pantic,et al.  Bimodal log-linear regression for fusion of audio and visual features , 2013, MM '13.

[27]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[28]  Cigdem Eroglu Erdem,et al.  BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States , 2017, IEEE Transactions on Affective Computing.