Formulation of a vector distance measure for the instantaneous-frequency distribution (IFD) of speech
暂无分享,去创建一个
The instantaneous-frequency distribution (IFD) of the outputs of a filter bank of overlapping band-pass channels (or the equivalent DFT implementation) has been proposed [1] as a short-time spectral measure useful for representing speech. For application of the IFD in conventional tasks such as word recognition, an appropriate vectorization is needed on which a useful distance function can be defined. A number of recent studies have shown the cepstral representation to be generally most effective in terms of recognition accuracy. We propose a "pseudo-cepstral" format which treats the IFD as if it were a log-power spectrum, with a matrix-weighted squared-Euclidean distance function. Comparative results are given for two forms of IFD as well as two conventional cepstral transforms, in a small-scale word-recognition task with added white Gaussian noise. In addition, a statistic is derived which is highly correlated with recognition error rates, but requires far less computation than is involved in an actual recognition trial.