ROBUST SPEECH DETECTION WITH HETEROSCEDASTIC DISCRIMINANT ANALYSIS APPLIED TO THE TIME-FREQUENCY ENERGY

In this paper, we propose a robust speech detection algorithm with Heteroscedastic Discriminant Analysis (HDA) applied to the Time-Frequency Energy (TFE). The TFE consists of the log energy in time domain, the log energy in the fixed band 2503500 Hz, and the log Mel-scale frequency bands energy. The bottom-up algorithm with automatic threshold adjustment is used for accurate word boundary detection. Compared to the algorithms based on the energy in time domain [1], the ATF parameter [2], the energy and the LDA-MFCC parameter [3], the proposed algorithm shows better performance under different types of noise.

[1]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[2]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[3]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Géraldine Damnati,et al.  Robust speech/non-speech detection using LDA applied to MFCC , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).