Cepstrum-Based Estimation of the Harmonics-to-Noise Ratio for Synthesized and Human Voice Signals

Cepstral analysis is used to estimate the harmonics-to-noise ratio (HNR) in speech signals. The inverse Fourier transformed liftered cepstrum approximates a noise baseline from which the harmonics-to-noise ratio is estimated. The present study highlights the cepstrum-based noise baseline estimation process; it is shown to analogous to the action of a moving average filter applied to the power spectrum of voiced speech. The noise baseline, which is taken to approximate the noise excited vocal tract is influenced by the window length and the shape of the glottal source spectrum. Two existing estimation techniques are tested systematically using synthetically generated glottal flow and voiced speech signals with a priori knowledge of the HNR. The source influence is removed using a novel harmonic pre-emphasis technique. The results indicate accurate HNR estimation using the present approach. A preliminary investigation of the method with a set of normal/ pathological data is investigated.