Speech/music discrimination using hybrid-based feature extraction for audio data indexing

In this paper, we present a speech/music discrimination (SMD) using hybrid manner of feature extraction to discriminate the noisy audio signal into speech and music. The hybrid-based SMD performs the combination of 1D signal processing and 2D image processing to extract multiple features. In general, the noisy audio segment can be regarded as music, speech or noise (silence). The proposed hybrid-based SMD approach has been successfully applied into audio data indexing to classify the noisy audio signal into speech, music and noise. The approach includes three main stages: pre-processing/voice activity detection (VAD), speech/music discrimination (SMD) and rule-based post-processing. Both of pre-processing and VAD are regarded as the first stage for discriminating audio recording stream into noise-only segments and noisy audio segments. Next, the hybrid-based SMD is regarded as the second stage to classify noisy audio segments into speech segments and music segments. In third stage, a rule-based post-filtering method will be applied in order to improve the discrimination accuracy and to reflect the continuity of audio data in time. Experimental results will show that the proposed hybrid-based SMD approach can successfully apply into the audio data indexing. The overall system accuracy will be evaluated on radio recordings from various sources. Performance results can provide significant classification for the envisaged tasks compared to existing methods is given.

[1]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Liang Gu,et al.  Robust singing detection in speech/music discriminator design , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Kun-Ching Wang,et al.  Wavelet-based voice activity detection algorithm in variable-level noise environment , 2009 .

[5]  Hyon-Soo Lee,et al.  Speech/Music Discrimination using Spectral Peak Feature for Speaker Indexing , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[6]  Suphakant Phimoltares,et al.  Speech and music classification using hybrid Form of spectrogram and fourier transformation , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[7]  Stefan Karnebäck Discrimination between speech and music based on a low frequency modulation feature , 2001, INTERSPEECH.

[8]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[9]  Kun-Ching Wang Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition , 2015, Sensors.

[10]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  Paul M. Baggenstoss,et al.  Speech music discrimination using class-specific features , 2004, ICPR 2004.