论文信息 - Extraction of Speech Pitch and Formant Frequencies using Discrete Wavelet Transform

Extraction of Speech Pitch and Formant Frequencies using Discrete Wavelet Transform

Extraction of pitch and formant frequencies is an important issue in speech processing. Pitch frequency is the fundamental frequency of the speech signal, and formant frequencies are essentially resonance frequencies of the vocal tract. These frequencies vary among different persons and words, but they are within certain frequency range. Practically, the first three formants are enough for coding and other processes. The most common methods for estimating formants are cepstrum and linear predictive coding. In this study, a wavelet based method using filter bank concepts is presented to estimate these frequencies.

[1] Rekha Hibare,et al. Feature Extraction Techniques in Speech Processing: A Survey , 2014 .

[2] S. Mallat. Multiresolution approximations and wavelet orthonormal bases of L^2(R) , 1989 .

[3] Keiichi Tokuda,et al. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4] Gerald Kaiser,et al. A Friendly Guide to Wavelets , 1994 .

[5] S. Qian. Introduction to Time-Frequency and Wavelet Transforms , 2001 .

[6] K. P. Soman,et al. Insight into Wavelets: From Theory to Practice , 2005 .

[7] Martin Vetterli,et al. Wavelets and filter banks: theory and design , 1992, IEEE Trans. Signal Process..

[8] S. Mallat. A wavelet tour of signal processing , 1998 .

[9] Jean Schoentgen,et al. Estimation of the formant frequencies by means of a wavelet transform of the speech spectrum , 1997 .

[10] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.