Speech/laughter classification in meeting audio

In this paper, harmonicity information is incorporated into acoustic features to detect laughter segments and speech segments. We implement our system using HMM (Hidden Markov Models) classifier trained on Pitch and Harmonic Frequency Scale based subband filters (PHFS). Harmonicity of the signal can be determined by variation of the pitch and harmonics. The cascaded subband filters are used to spread in pitch and harmonicity frequency scale to describe the harmonicity information. The pitch bandwidth of the first layer spans from 80 Hz to 300 Hz and the entire band spans 80 Hz ~ 8 kHz. The experiments are conducted on ICSI meeting corpus (BMR and Bed). We achieve an average error rate of 0.84% for ‘BMR’ meeting and 3.64% for ‘BED’ meeting in segment level speech and laughter detection. The results show that the proposed Pitch and Harmonic Frequency Scale (PHFS) based feature is robust and effective.

[1]  Abeer Alwan,et al.  Evaluation of noise robust features on the Aurora databases , 2002, INTERSPEECH.

[2]  D Yves von Cramon,et al.  Distinct fMRI responses to laughter, speech, and sounds along the human peri-sylvian cortex. , 2005, Brain research. Cognitive brain research.

[3]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Soo Ngee Koh,et al.  A fundamental frequency detector of speech signals based on short time Fourier transform , 1994, Proceedings of TENCON'94 - 1994 IEEE Region 10's 9th Annual International Conference on: 'Frontiers of Computer Technology'.

[5]  K. Shirai,et al.  Acoustic analysis and synthesis of laughter , 2006 .

[6]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[7]  Izhak Shafran,et al.  Robust speech detection and segmentation for real-time ASR applications , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Mitch Weintraub,et al.  Using speech/non-speech detection to bias recognition search on noisy data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  D. V. Leeuwen,et al.  Evaluating automatic laughter segmentation in meetings using acoustic and acoustic-phonetic features , 2007 .

[10]  Hsiao-Chuan Wang,et al.  New speech harmonic structure measure and it application to post speech enhancement , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Sheri Hunnicutt,et al.  Acoustic analysis of laughter , 1992, ICSLP.

[12]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[13]  Andreas Stolcke,et al.  Multispeaker speech activity detection for the ICSI meeting recorder , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[14]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[15]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[16]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[17]  Akinori Ito,et al.  Smile and laughter recognition using speech processing and face recognition from conversation video , 2005, 2005 International Conference on Cyberworlds (CW'05).

[18]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).