A new voice activity detector using subband order-statistics filters for robust speech recognition

Currently, there are technology barriers inhibiting speech processing systems working under extreme noisy conditions. The emerging applications of speech technology, especially in the fields of wireless communications, digital hearing aids or speech recognition, are some examples of such systems often requiring a noise reduction technique in combination with a precise voice activity detector (VAD). This paper presents a new VAD for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm uses long-term information about the speech signal to formulate the decision rule and estimates the subband SNR using specialized order statistics filters (OSF). The proposed algorithm is compared to the most commonly used VAD in the field, in terms of speech/nonspeech discrimination and also in terms of recognition performance when the VAD is used in an automatic speech recognition (ASR) system. Experimental results demonstrate a sustained advantage over different VAD methods including standard VAD such as G.729 and AMR which are used as a reference, the VAD of the Advanced Front-End (AFE) for distributed speech recognition (DSR), and recently reported algorithms.