On the use of autocorrelation analysis for pitch detection

One of the most time honored methods of detecting pitch is to use some type of autocorrelation analysis on speech which has been appropriately preprocessed. The goal of the speech preprocessing in most systems is to whiten, or spectrally flatten, the signal so as to eliminate the effects of the vocal tract spectrum on the detailed shape of the resulting autocorrelation function. The purpose of this paper is to present some results on several types of (nonlinear) preprocessing which can be used to effectively spectrally flatten the speech signal The types of nonlinearities which are considered are classified by a non-linear input-output quantizer characteristic. By appropriate adjustment of the quantizer threshold levels, both the ordinary (linear) autocorrelation analysis, and the center clipping-peak clipping autocorrelation of Dubnowski et al. [1] can be obtained. Results are presented to demonstrate the degree of spectrum flattening obtained using these methods. Each of the proposed methods was tested on several of the utterances used in a recent pitch detector comparison study by Rabiner et al. [2] Results of this comparison are included in this paper. One final topic which is discussed in this paper is an algorithm for adaptively choosing a frame size for an autocorrelation pitch analysis.