Comparison of voice activity detection algorithms for VoIP

We discuss techniques for voice activity detection (VAD) for voice over Internet Protocol (VoIP). VAD aids in saving the bandwidth requirement of a voice session, thereby increasing the bandwidth efficiently. We compare the quality of speech, level of compression and computational complexity for three time-domain and three frequency-domain VAD algorithms. Implementation of time-domain algorithms is computationally simple. However, better speech quality is obtained with the frequency-domain algorithms. A comparison of the merits and demerits along with the subjective quality of speech after removal of silence periods is presented for all the algorithms. A quantitative measurement of speech quality for different algorithms is also presented.

[1]  Ahmet M. Kondoz,et al.  Mixed decision-based noise adaptation for speech enhancement , 2001 .

[2]  Dennis Hardman,et al.  Agilent Technologies Voice Quality in Converging Telephony and IP Networks , .

[3]  Jon Postel,et al.  Time Protocol , 1983, RFC.

[4]  J. E. Flood 'Telecommunications, Switching, Traffic and Networks' , 1995 .

[5]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[6]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[7]  Pavel Sovka,et al.  Noise suppression system for a car , 1993, EUROSPEECH.

[8]  P. Kabal,et al.  Comparison of voice activity detection algorithms for wireless personal communications systems , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[9]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.