Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure

In this paper, we propose a new scheme to analyze the spectral structure of speech signals for fundamental frequency estimation. First, we propose a pitch measure to detect the harmonic characteristics of voiced sounds on the spectrum of a speech signal. This measure utilizes the properties that there are distinct impulses located at the positions of fundamental frequency and its harmonics, and the energy of voiced sound is dominated by the energy of these distinct harmonic impulses. The spectrum can be obtained by the fast Fourier transform (FFT) however, it may be destroyed when the speech is interfered with by additive noise. To enhance the robustness of the proposed scheme in noisy environments, we apply the joint time-frequency analysis (JTFA) technique to obtain the adaptive representation of the spectrum of speech signals. The adaptive representation can accurately extract important harmonic structure of noisy speech signals at the expense of high computation cost. To solve this problem, we further propose a fast adaptive representation (FAR) algorithm, which reduces the computation complexity of the original algorithm by 50%. The performance of the proposed fundamental-frequency estimation scheme is evaluated on a large database with or without additive noise. The performance is compared to that of other approaches on the same database. The experimental results show that the proposed scheme performs well on clean speech and is robust in noisy environments.

[1]  A.P. Benguerel,et al.  Speech analysis , 1981, Proceedings of the IEEE.

[2]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[3]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[6]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[7]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[8]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[9]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[10]  R.W. Schafer,et al.  Digital representations of speech signals , 1975, Proceedings of the IEEE.

[11]  Chiu-yu Tseng,et al.  Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary , 1993, IEEE Trans. Speech Audio Process..

[12]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[13]  L. Rabiner,et al.  System for automatic formant analysis of voiced speech. , 1970, The Journal of the Acoustical Society of America.

[14]  T. Claasen,et al.  THE WIGNER DISTRIBUTION - A TOOL FOR TIME-FREQUENCY SIGNAL ANALYSIS , 1980 .

[15]  Mary P. Harper,et al.  Classification of Thai tone sequences in syllable-segmented speech using the analysis-by-synthesis method , 1999, IEEE Trans. Speech Audio Process..

[16]  Shie Qian,et al.  Signal representation using adaptive normalized Gaussian functions , 1994, Signal Process..

[17]  George R. Doddington,et al.  Postprocessing techniques for voice pitch trackers , 1982, ICASSP.

[18]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[19]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[20]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[21]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22]  S. Qian,et al.  Joint time-frequency analysis , 1999, IEEE Signal Process. Mag..