论文信息 - An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis

An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis

Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.

Wei Zhang | Lin Ma | Haifeng Li | Chang Gao

[1] D. Wang,et al. Automatic audio segmentation using the Generalized Likelihood Ratio , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[2] Yonghong Yan,et al. Audio Segmentation via Tri-Model Bayesian Information Criterion , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[4] Constantine Kotropoulos,et al. Computationally Efficient and Robust BIC-Based Speaker Segmentation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Ing-Jr Ding,et al. A Method of Combining Gaussian Mixture Model and K-Means for Automatic Audio Segmentation of Popular Music , 2012, ICITCS.

[6] E. Paulus,et al. Speech Signal Processing , 1997, The Electrical Engineering Handbook - Six Volume Set.