Pitch- and Formant-Based Order Adaptation of the Fractional Fourier Transform and Its Application to Speech Recognition

Fractional Fourier transform (FrFT) has been proposed to improve the time-frequency resolution in signal analysis and processing. However, selecting the FrFT transform order for the proper analysis of multicomponent signals like speech is still debated. In this work, we investigated several order adaptation methods. Firstly, FFT- and FrFT- based spectrograms of an artificially-generated vowel are compared to demonstrate the methods. Secondly, an acoustic feature set combining MFCC and FrFT is proposed, and the transform orders for the FrFT are adaptively set according to various methods based on pitch and formants. A tonal vowel discrimination test is designed to compare the performance of these methods using the feature set. The results show that the FrFT-MFCC yields a better discriminability of tones and also of vowels, especially by using multitransform-order methods. Thirdly, speech recognition experiments were conducted on the clean intervocalic English consonants provided by the Consonant Challenge. Experimental results show that the proposed features with different order adaptation methods can obtain slightly higher recognition rates compared to the reference MFCC-based recognizer.

[1]  Luis Weruaga,et al.  Self-organizing chirp-sensitive artificial auditory cortical model , 2005, INTERSPEECH.

[2]  Mark A. Gluck,et al.  Modeling auditory cortical processing as an adaptive chirplet transform , 2000, Neurocomputing.

[3]  George Saon,et al.  Fractional Fourier transform features for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[5]  Martin J. Bastiaans,et al.  On fractional Fourier transform moments , 2000, IEEE Signal Processing Letters.

[6]  Katsuhiko Shirai,et al.  Noisy speech recognition using temporal AM-FM combination , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  R. J. Mammone,et al.  New speech enhancement techniques using the pitch mode modulation model , 1993, Proceedings of 36th Midwest Symposium on Circuits and Systems.

[8]  Alexandros Potamianos,et al.  Statistical analysis of amplitude modulation in speech signals using an AM-FM model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Thippur V. Sreenivas,et al.  Novel approach to AM-FM decomposition with applications to speech and music analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Nasser Kehtarnavaz,et al.  Characterization of transient wandering tones by dynamic modeling of fractional-Fourier features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Saeed Gazor,et al.  Adaptive Maximum Windowed Likelihood Multicomponent AM-FM Signal Decomposition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Giorgio Biagetti,et al.  Multicomponent AM–FM Representations: An Asymptotically Exact Approach , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Zhao Xing Chirp Signal Detection and Multiple Parameter Estimation Using Radon-Ambiguity and Fractional Fourier Transform , 2003 .

[14]  Douglas L. Jones,et al.  A high resolution data-adaptive time-frequency representation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  Jinfang Wang,et al.  Speaker Recognition Using Features Derived from Fractional Fourier Transform , 2005, AutoID.

[16]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[17]  Qilin,et al.  Detection and parameter estimation of multicomponent LFM signal based on the fractional Fourier transform , 2004 .

[18]  Wang Zhenli,et al.  On the application of fractional Fourier transform for enhancing noisy speech , 2005, 2005 IEEE International Symposium on Microwave, Antenna, Propagation and EMC Technologies for Wireless Communications.

[19]  Tony Ezzat,et al.  AM-FM Demodulation of Spectrograms using Localized 2D Max-Gabor Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Thippur V. Sreenivas,et al.  Mixture Gaussian envelope chirp model for speech and audio , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Thomas F. Quatieri,et al.  Sinewave Analysis/Synthesis Based on the Fan-Chirp Tranform , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[22]  S. Gazor,et al.  AM-FM decomposition of speech signal using MWL criterion , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[23]  Gaël Richard,et al.  Estimation of Frequency for AM/FM Models Using the Phase Vocoder Framework , 2008, IEEE Transactions on Signal Processing.

[24]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[25]  Augustus J. E. M. Janssen,et al.  A time warper for speech signals , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[26]  Luis Weruaga,et al.  Adaptive chirp-based time-frequency analysis of speech signals , 2006, Speech Commun..

[27]  M. Christiansen,et al.  Call for Papers - Special Issue of , 2010 .

[28]  Luis Weruaga,et al.  Speech analysis with the fast chirp transform , 2004, 2004 12th European Signal Processing Conference.

[29]  Ran Tao,et al.  Research progress of the fractional Fourier transform in signal processing , 2006, Science in China Series F.

[30]  Sergio Barbarossa,et al.  Analysis of multicomponent LFM signals by a combined Wigner-Hough transform , 1995, IEEE Trans. Signal Process..

[31]  Luís B. Almeida,et al.  The fractional Fourier transform and time-frequency representations , 1994, IEEE Trans. Signal Process..

[32]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[33]  V. Namias The Fractional Order Fourier Transform and its Application to Quantum Mechanics , 1980 .

[34]  Odette Scharenborg,et al.  The interspeech 2008 consonant challenge , 2008, INTERSPEECH.

[35]  Luis Weruaga,et al.  High-resolution noise-robust spectral-based pitch estimation , 2005, INTERSPEECH.

[36]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[37]  Juan G. Vargas-Rubio,et al.  An improved spectrogram using the multiangle centered discrete fractional Fourier transform , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[38]  Guoan Bi,et al.  Adaptive Harmonic Fractional Fourier Transform , 1999, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[39]  Jingming Kuang,et al.  Adaptive-order fractional Fourier transform features for speech recognition , 2008, INTERSPEECH.

[40]  Petros Maragos,et al.  Modulation and chaotic acoustic features for speech recognition , 2002 .

[41]  Ye Tian,et al.  Tone articulation modeling for Mandarin spontaneous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Ying Huang,et al.  Speech modelling by non-stationary partials with time varying amplitude and frequency , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[43]  Xihong Wu,et al.  Monaural speech separation based on multi-scale Fan-Chirp Transform , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.