VAD for VoIP Using Cepstrum

As telephony services are being supported on Internet the focus is now on multiplexing many speech streams by exploiting the speech characteristics. The multiplexing gain is an important factor when applications such as teleconference service are ported on to the Internet. Here we discuss Voice Activity Detection (VAD) for Voice over Internet Protocol (VoIP) based on Cepstrum. VAD aids in saving bandwidth of a voice session. Such a scheme would be implemented in the application layer thus VAD is independent of the lower layers. The standard codecs would inherently have the VAD algorithms to reduce the bandwidth. However they are costly and computationally complex. In this paper, we compare the quality of speech, level of compression and computational complexity of our method of Cepstrum based VAD with the standard GSM and ITU-T G.729 codecs. Bandwidth reduction is achieved by not transmitting the non-speech packets. Our algorithm adapts to the varying background noise.

[1]  Wayne H. Ward,et al.  Phone based voice activity detection using online Bayesian adaptation with conjugate normal distributions , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Peter Kabal,et al.  Natural-quality background noise coding using residual substitution , 1999, EUROSPEECH.

[3]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[4]  Rangarao Muralishankar,et al.  DCT based pseudo complex cepstrum , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  P. Kabal,et al.  Comparison of voice activity detection algorithms for wireless personal communications systems , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[6]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[7]  Leigh Stoller,et al.  The Design of the Schizophrenic Workstation System , 1993, USENIX MACH Symposium.

[8]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[9]  J. E. Flood 'Telecommunications, Switching, Traffic and Networks' , 1995 .

[10]  H.S. Jamadagni,et al.  Second and third order adaptable threshold for VAD in VoIP , 2002, 6th International Conference on Signal Processing, 2002..

[11]  Jerry D. Gibson,et al.  Multimedia Communications : Directions And Innovations , 2001 .

[12]  Ahmet M. Kondoz,et al.  Mixed decision-based noise adaptation for speech enhancement , 2001 .

[13]  Kostas Samaras,et al.  Impact of Statistical Multiplexing on Voice Quality in Cellular Networks , 2002, Mob. Networks Appl..

[14]  Pavel Sovka,et al.  Noise suppression system for a car , 1993, EUROSPEECH.

[15]  Dennis Hardman,et al.  Agilent Technologies Voice Quality in Converging Telephony and IP Networks , .

[16]  Jon Postel,et al.  Time Protocol , 1983, RFC.