On the use of transient information in speech recognition
暂无分享,去创建一个
In this paper we investigate the effects of signal processing on the performance of isolated-word recognition by changing various time-resolution related parameters. The vocabulary used, {"P", "B", "T", "D", "V", "Z"} , is a highly confusable subset of the 39-word alpha-digit database. We showed that the recognition performance is significantly improved by trace segmentation which compresses the steady-state parts of speech signals and refines the endpoints. By changing the cutoff frequency of the low-pass filter in the filterbank analysis, we observed the existence of an optimal region of cutoff frequencies ranging from 50 to 100 Hz (at -6 dB). Outside this region, the performance does not deteriorate completely even at a very low cutoff frequency where the transients are severely distorted. This phenomenon was explained by the fact of spectral modification of the steady-state vowels following the initial transients.
[1] Hermann Ney,et al. Fast nonlinear time alignment for isolated word recognition , 1981, ICASSP.
[2] Jean-Sylvain Liénard. Speech characterization from a rough spectral analysis , 1979, ICASSP.
[3] Jean-Luc Gauvain,et al. On the use of time compression for word-based recognition , 1983, ICASSP.
[4] A. Gray,et al. Distance measures for speech processing , 1976 .