Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings

Laughter is a key element of human-human interaction, occurring surprisingly frequently in multi-party conversation. In meetings, laughter accounts for almost 10% of vocalization effort by time, and is known to be relevant for topic segmentation and the automatic characterization of affect. We present a system for the detection of laughter, and its attribution to specific participants, which relies on simultaneously decoding the vocal activity of all participants given multi-channel recordings. The proposed framework allows us to disambiguate laughter and speech not only acoustically, but also by constraining the number of simultaneous speakers and the number of simultaneous laughers independently, since participants tend to take turns speaking but laugh together. We present experiments on 57 hours of meeting data, containing almost 11000 unique instances of laughter.

[1]  Guy J. Brown,et al.  Speech and crosstalk detection in multichannel audio , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Frank Dignum Advances in Agent Communication , 2003, Lecture Notes in Computer Science.

[3]  Andreas Stolcke,et al.  Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  D. V. Leeuwen,et al.  Evaluating automatic laughter segmentation in meetings using acoustic and acoustic-phonetic features , 2007 .

[5]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[6]  Tanja Schultz,et al.  A SUPERVISED FACTORIAL ACOUSTIC MODEL FOR SIMULTANEOUS MULTIPARTICIPANT VOCAL ACTIVITY DETECTION IN CLOSE-TALK MICROPHONE RECORDINGS OF MEETINGS , 2007 .

[7]  Carolyn Penstein Rosé,et al.  The Necessity of a Meeting Recording and Playback System, and the Benefit of Topic-Level Annotations to Meeting Browsing , 2005, INTERACT.

[8]  Guy J. Brown,et al.  Feature selection for the classification of crosstalk in multi-channel audio , 2003, INTERSPEECH.

[9]  S. Burger,et al.  On the Correlation between Perceptual and Contextual Aspects of Laughter in Meetings , 2007 .

[10]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[12]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[13]  Mary P. Harper,et al.  Speech Activity Detection on Multichannels of Meeting Recordings , 2005, MLMI.

[14]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[15]  Fabio Paternò,et al.  Human-Computer Interaction - INTERACT 2005 , 2005, Lecture Notes in Computer Science.

[16]  Jithendra Vepa,et al.  The segmentation of multi-channel meeting recordings for automatic speech recognition , 2006, INTERSPEECH.

[17]  Kornel Laskowski,et al.  Annotation and Analysis of Emotionally Relevant Behavior in the ISL Meeting Corpus , 2006, LREC.

[18]  Nikki Mirghafori,et al.  Automatic laughter detection using neural networks , 2007, INTERSPEECH.

[19]  O. J. Murphy,et al.  Characteristic time intervals in telephonic conversation , 1938 .

[20]  A. Fogel,et al.  The integration of laughter and speech in vocal communication: a dynamic systems perspective. , 1999, Journal of speech, language, and hearing research : JSLHR.

[21]  Tanja Schultz,et al.  Simultaneous multispeaker segmentation for automatic meeting recognition , 2007, 2007 15th European Signal Processing Conference.

[22]  Kornel Laskowski,et al.  Analysis of the occurrence of laughter in meetings , 2007, INTERSPEECH.

[23]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[24]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[25]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[26]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[27]  Andreas Stolcke,et al.  Improved speech activity detection using cross-channel features for recognition of multiparty meetings , 2006, INTERSPEECH.

[28]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .