A robust speech/non-speech detection algorithm using time and frequency-based features

The authors address the problem of automatic endpoint detection in normal and adverse conditions. Attention has been given to automatic endpoint detection for both additive noise and noise-induced changes in the talker's speech production (Lombard reflex). After a comparison of several automatic endpoint detection algorithms in different noisy-Lombard conditions, the authors propose a new algorithm. This algorithm identifies islands of reliability (essentially the portion of speech contained between the first and last vowel) using time- and frequency-based features and then applies a noise adaptive procedure to refine the endpoints. It is shown that this algorithm outperforms the commonly used algorithm developed by Lamel et al. (1981), and several other recently developed methods.<<ETX>>