论文信息 - Perceived Speech Quality Estimation Using DTW Algorithm

Perceived Speech Quality Estimation Using DTW Algorithm

In this paper a method for speech quality estimation is evaluated by simulating the transfer of speech over packet switched and mobile networks. The proposed system uses Dynamic Time Warping algorithm for test and received speech comparison. Several tests have been made on a test speech sample of a single speaker with simulated packet (frame) loss effects on the perceived speech. The achieved results have been compared with measured PESQ values on the used transmission channel and their correlation has been observed. measured or estimated transport parameters of the received speech. In this paper, research of the usability of DTW (Dynamic Time Warping) method for speech quality estimation is presented. This is a sequence matching algorithm between the test and received speech sequences performed after transmission over packet-switched or mobile communication channels. The DTW algorithm compares arrays of mel-cepstral coefficients which simulate the perception of human auditory system and it is usually used as a building block for simple speech recognizers (2). Three speech codecs have been used in the experiments, G.711 (3), AMR 12.4 kb/s (compatible with GSM-EFR) (4) and G.729. The effects of packet receiving errors are modeled for a random and bursty packet loss. Low bit rate (high compression ratio) codecs are used to reduce the required bandwidth, but distort the original waveform significantly before it is even transmitted. The compressed speech produced by such codecs is also more sensitive to packet loss (5). Different values for similarity metrics are observed after comparing the test and received speech sequences with varying the values of the possibility of packet loss errors and the possibility of introducing burstiness during packet errors (expressed as a percent of lost packets or frames). Achieved results have been compared with PESQ measured values (P.862 ITU-T) (6) on the transmission channel. They introduce high correlation values which justify the usability of this technique as a simple tool for perceived speech quality measurement in VoIP and GSM networks.

Ivan Kraljevski | Zoran Gacovski | Sime Arsenovski | Slavcho Chungurski

[1] Donald F. Towsley,et al. Modeling frame-level errors in GSM wireless channels , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[2] Gerardo Rubino,et al. A method for quantitative evaluation of audio quality over packet networks and its comparison with existing techniques (in MESAQUIN'04, Prague, June 2004) , 2004 .

[3] Lingfen Sun,et al. Perceived speech quality prediction for voice over IP-based networks , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).