Differentiating Laughter Types via HMM/DNN and Probabilistic Sampling

In human speech, laughter has a special role as an important non-verbal element, signaling a general positive affect and cooperative intent. However, laughter occurrences may be categorized into several sub-groups, each having a slightly or significantly different role in human conversation. It means that, besides automatically locating laughter events in human speech, it would be beneficial if we could automatically categorize them as well. In this study, we focus on laughter events occurring in Hungarian spontaneous conversations. First we use the manually annotated occurrence time segments, and the task is to simply determine the correct laughter type via Deep Neural Networks (DNNs). Secondly we seek to localize the laughter events as well, for which we utilize Hidden Markov Models. Detecting different laughter types also poses a challenge to DNNs due to the low number of training examples for specific types, but this can be handled using the technique of probabilistic sampling during frame-level DNN training.

[1]  Gábor Gosztolya,et al.  On evaluation metrics for social signal detection , 2015, INTERSPEECH.

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[3]  László Tóth,et al.  Training HMM/ANN Hybrid Speech Recognizers by Probabilistic Sampling , 2005, ICANN.

[4]  Alessandro Vinciarelli,et al.  Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[5]  László Tóth Phone recognition with hierarchical convolutional deep maxout networks , 2015, EURASIP J. Audio Speech Music. Process..

[6]  Mária Gósy,et al.  BEA – A multifunctional Hungarian spoken language database , 2013 .

[7]  András Beke,et al.  Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset , 2016 .

[8]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[9]  Björn W. Schuller,et al.  Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development , 2016, INTERSPEECH.

[10]  Björn W. Schuller,et al.  Hierarchical neural networks and enhanced class posteriors for social signal classification , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[11]  William Curran,et al.  Laughter Type Recognition from Whole Body Motion , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[12]  Roddy Cowie,et al.  ILHAIRE Laughter Database , 2012 .

[13]  László Tóth Phone recognition with deep sparse rectifier neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Gábor Gosztolya,et al.  Social Signal Detection by Probabilistic Sampling DNN Training , 2020, IEEE Transactions on Affective Computing.

[15]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Marina Davila Ross,et al.  The evolution of laughter in great apes and humans , 2010 .

[17]  Kornel Laskowski,et al.  Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[19]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[20]  Steve Young,et al.  The HTK book , 1995 .

[21]  Jason Wong,et al.  Audiovisual Affect Recognition in Spontaneous Filipino Laughter , 2011, 2011 Third International Conference on Knowledge and Systems Engineering.

[22]  Ah Chung Tsoi,et al.  Neural Network Classification and Prior Class Probabilities , 1996, Neural Networks: Tricks of the Trade.

[23]  Georg Heigold,et al.  Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[25]  László Tóth,et al.  A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition , 2013, TSD.

[26]  András Beke,et al.  Automatic Laughter Detection in Spontaneous Speech Using GMM-SVM Method , 2013, TSD.

[27]  András Beke,et al.  Automatic laughter detection in Hungarian spontaneous speech using GMM/ANN hybrid method , 2013 .

[28]  Nick Campbell,et al.  No laughing matter , 2005, INTERSPEECH.