Pitch perception physiology and psychophysics as a basis for the design of pitch detection algorithms

The generation of high‐quality speech with a source‐filter vocoder depends to a very great extent on accurate analysis of source parameters. After decades of research, even state‐of‐the‐art pitch detection algorithms tend to make gross errors in the analysis of signals that present no difficulty for the human listener. In this study a review of a broad range of pitch detection algorithms was undertaken, with particular attention to the plausibility of those algorithms in relation to what is currently known about the psychophysics of pitch perception and the neural coding of speech signals. Our principal conclusion from this review is that the most plausible model is a time‐domain pitch perception scheme proposed more than 4 decades ago by Licklider [J. C. R. Licklider, Experientia 7, 128–133 (1951)], and extended in more recent studies. The implications of these findings for source‐filter vocoders will be discussed, and an implementation of the Licklider model using level‐crossing interval histograms will be described.