Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition

It is well known that auditory system of human beings has excellent performance with which automatic speech recognition(ASR) systems can't match,and fractional Fourier transform(FrFT) lias unique advantages in nonstationary signal processing.In this paper,the Gammatone filterbank is applied to speech signals for front-end temporal filtering,and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. The transform order is critical for FrFT.An order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function.ASR experiments are conducted on clean and noisy Mandarin digits,and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline,and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function.Further more,the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.