In pathologic voices, both slow and fast pitch variations within an utterance are indicative of the patient status. Moreover, the spectrogram of such voices usually shows high noise components, closely related to the degree of perceived hoarseness of the voice. In the present paper, both pitch and noise variations are tracked during an utterance. This is accomplished by means of a two-step procedure for finding f0 , based on robust estimation approaches, which allows selecting the varying optimal time window for analysis. The Normalised Noise Energy method [ 1] is revisited and an adaptive version is applied on optimised signal windows. Empty "dip" regions are avoided and the method results applicable both to sustained vowels and to words. Simulations show the good performance of the proposed approach. Its application to real data allows the physician objectively tracking important voice parameters.
[1]
A Fort,et al.
Parametric and non-parametric estimation of speech formants: application to infant cry.
,
1996,
Medical engineering & physics.
[2]
Ingrid Daubechies,et al.
The wavelet transform, time-frequency localization and signal analysis
,
1990,
IEEE Trans. Inf. Theory.
[3]
John H. L. Hansen,et al.
Discrete-Time Processing of Speech Signals
,
1993
.
[4]
Bhaskar D. Rao,et al.
Model based processing of signals: a state space approach
,
1992,
Proc. IEEE.
[5]
H. Kasuya,et al.
Normalized noise energy as an acoustic measure to evaluate pathologic voice.
,
1986,
The Journal of the Acoustical Society of America.
[6]
Claudia Manfredi,et al.
Acoustic measure of noise energy in vocal folds operated patients
,
1998,
9th European Signal Processing Conference (EUSIPCO 1998).