On the use of transient information in speech recognition

In this paper we investigate the effects of signal processing on the performance of isolated-word recognition by changing various time-resolution related parameters. The vocabulary used, {"P", "B", "T", "D", "V", "Z"} , is a highly confusable subset of the 39-word alpha-digit database. We showed that the recognition performance is significantly improved by trace segmentation which compresses the steady-state parts of speech signals and refines the endpoints. By changing the cutoff frequency of the low-pass filter in the filterbank analysis, we observed the existence of an optimal region of cutoff frequencies ranging from 50 to 100 Hz (at -6 dB). Outside this region, the performance does not deteriorate completely even at a very low cutoff frequency where the transients are severely distorted. This phenomenon was explained by the fact of spectral modification of the steady-state vowels following the initial transients.