Normally, voice activity detection (VAD) refers to speech processing algorithms for detecting the presence or absence of human speech in segments of audio signals. In this paper, however, we focus on speech detection algorithms that take VoIP traffic instead of audio signals as input. We call this category of algorithms network-level VAD. Traditional VAD usually plays a fundamental role in speech processing systems because of its ability to delimit speech segments. Network-level VAD, on the other hand, can be quite helpful in network management, which is the motivation for our study. We propose the first real-time network-level VAD algorithm that can extract voice activity from encrypted and non-silence-suppressed Skype traffic. We evaluate the speech detection accuracy of the proposed algorithm with extensive real-life traces. The results show that our scheme achieve reasonably good performance even high degree of randomness has been injected into the network traffic.
[1]
Harry Wechsler,et al.
Detection of human speech in structured noise
,
1994,
Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[2]
Wu-chi Feng,et al.
A traffic characterization of popular on-line games
,
2005,
IEEE/ACM Transactions on Networking.
[3]
Ieee Microwave Theory,et al.
IEEE Standard for Local and Metropolitan Area Networks Part 16: Air Interface for Fixed Broadband Wireless Access Systems Draft Amendment: Management Information Base Extensions
,
2007
.
[4]
Randy Moore,et al.
Thanks …
,
2019,
Witcraft.
[5]
Ignas Niemegeers,et al.
Voice Activity Detection for VoIP—An Information Theoretic Approach
,
2006
.
[6]
Francesco Beritelli,et al.
A robust voice activity detector for wireless communications using soft computing
,
1998,
IEEE J. Sel. Areas Commun..