论文信息 - Speech/Audio Signal Classification Using Spectral Flux Pattern Recognition

Speech/Audio Signal Classification Using Spectral Flux Pattern Recognition

In this paper, we present a novel method for the improvement of speech and audio signal classification using spectral flux (SF) pattern recognition for the MPEG Unified Speech and Audio Coding (USAC) standard. For effective pattern recognition, the Gaussian mixture model (GMM)probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM)algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and audio signals using SF pattern recognition. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme.

Jieun Kim | Sangkil Lee | Insung Lee

[1] Philippe Gournay,et al. A Novel Scheme for Low Bitrate Unified Speech and Audio Coding – MPEG RM0 , 2009 .

[2] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[3] Pasi Ojala,et al. AMR-WB+: a new audio coding standard for 3rd generation mobile audio services , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4] Heiko Purnhagen,et al. A Closer Look into MPEG-4 High Efficiency AAC , 2003 .

[5] L. H. Anauer,et al. Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] Marina Bosi,et al. Overview of MPEG audio : Current and future standards for low-bit-rate audio coding , 1997 .

[8] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..