A multiresolutionally oriented approach for determination of cepstral features in speech recognition

This paper presents an effort to provide a more efficient speech signal representation, which aims to be incorporated into an automatic speech recognition system. Modified cepstral coefficients, derived from a multiresolu-tion auditory spectrum are proposed. The multiresolu-tion spectrum was obtained using sliding single point discrete Fourier transformations. It is shown that the obtained spectrum values are similar to the results of a nonuniform filtering operation. The presented cepstral features are evaluated by introducing them into a simple phone recognition system. Speech processing for speech recognition is a perceptual signal analysis. Its goal is to identify a relatively small number of perceptually significant speech signal features. In general, such features are of finite extent in time and there may be several in any given time interval. Conventional feature extraction methods, used within the " state of the art " speech recognition systems are based on the short-term features in conjunction with dynamic features [3, 5]. All these features, merged in a feature vector, are usually of the same extent in time. It is known that speech signals exhibit many non-stationary phenomena which are reflected in some local properties of a signal. Using only a fixed-window signal analysis, these local properties are poorly described. This is the reason, why multiresolution signal analysis was introduced [10]. Wavelet transforms have become well known as useful multiresolution tools for analysis of signals [4, ]. Another successful tool for multireso-lution analysis, which has also inspired our research, is the multiresolution Fourier transform [10]. These methods have been successfully used for many signal processing applications. However, in the speech recognition domain both transforms are still being explored to develop a better speech signal representation [1]. We decided to investigate the multiresolution concept and to try to incorporate it into the procedure of deriving the well known cepstral features, which are widely used within successful speech recognition systems [11]. In the following sections, we present an approach for determination of the multiresolution auditory spectrum, the cepstral features derived from this spectrum, and finally, we evaluate the presented features through results of a simple phone recognition task. One of the basic signal representations is its spectrum. An important consideration to be taken here is the equivalence between a spectrum measurement and the output of a filter (for a single spectral point) or a bank of filters (for multiple spectral points) [8]. Consequently, we can notice that the Discrete Fourier …