Scale transform in speech analysis

In this paper, we study the scale transform of the spectral-envelope of speech utterances by different speakers. This study is motivated by the hypothesis that the formant frequencies between different speakers are approximately related by a scaling constant for a given vowel. The scale transform has the fundamental property that the magnitude of the scale-transform of a function X(f) and its scaled version /spl radic//spl alpha/X(/spl alpha/f) are same. The methods presented here are useful in reducing variations in acoustic features. We show that the F-ratio tests indicate better separability of vowels by using scale-transform based features than mel-transform based features. The data used in the comparison of the different features consist of 200 utterances of four vowels that are extracted from the TIMIT database.

[1]  Leon Cohen,et al.  The scale representation , 1993, IEEE Trans. Signal Process..

[2]  A.H. Nuttall,et al.  Spectral estimation using combined time and lag weighting , 1982, Proceedings of the IEEE.

[3]  Leon Cohen,et al.  Scale-invariant speech analysis via joint time-frequency-scale processing , 1995, Optics + Photonics.

[4]  D. Nelson Correlation based speech formant recovery , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Leon Cohen,et al.  Joint representation in time and frequency scale for harmonic type signals , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[8]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Leon Cohen,et al.  Scale and harmonic-type signals , 1994, Optics & Photonics.

[10]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .

[11]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[12]  D.H. Johnson,et al.  The Signal Processing Information Base , 1993, IEEE Signal Processing Magazine.

[13]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..