A Pitch Detector Based on a Generalized Correlation Function

This paper proposes a novel pitch determination algorithm (PDA) based on the newly introduced concept of a generalized correlation function called correntropy. Correntropy is a positive definite kernel function which implicitly transforms the original signal into a high-dimensional reproducing kernel Hilbert space (RKHS) in a nonlinear way, and calculates very efficiently the generalized correlation in that RKHS. By incorporating the kernel function, correntropy is able to utilize higher order statistics to enhance the resolution of pitch estimation. The proposed PDA computes the summary of correntropy functions from the outputs of an equivalent rectangular bandwidth (ERB) filter bank. We present simulations on pitch determination for a single vowel, double vowels, and a benchmark database test. Simulations show that correntropy exhibits much better resolution than conventional autocorrelation in pitch determination and outperforms other PDAs in the benchmark database test.

[1]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[2]  M. Puckette,et al.  Calculation of a "narrowed" autocorrelation function. , 1989, The Journal of the Acoustical Society of America.

[3]  Andrzej Cichocki,et al.  A new nonlinear similarity measure for multichannel signals , 2008, Neural Networks.

[4]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[5]  A. de Cheveigné Multiple F0 estimation , 2006 .

[6]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[7]  A. de Cheveigné Cancellation model of pitch perception. , 1998, The Journal of the Acoustical Society of America.

[8]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[9]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[10]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[11]  Guy J. Brown,et al.  Modelling the perceptual segregation of double vowels with a network of neural oscillators , 1997, Neural Networks.

[12]  A. Cheveigné Cancellation model of pitch perception. , 1998 .

[13]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[14]  Guy J. Brown,et al.  Multiple F0 Estimation , 2006 .

[15]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[16]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  José Carlos Príncipe,et al.  Nonlinear Component Analysis Based on Correntropy , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[18]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[19]  J. Príncipe,et al.  The Correntropy Mace Filter for Image Recognition , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[20]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[21]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[22]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[23]  José Carlos Príncipe,et al.  Generalized correlation function: definition, properties, and application to blind equalization , 2006, IEEE Transactions on Signal Processing.

[24]  Alain de Cheveigné,et al.  Pitch perception models , 2005 .

[25]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[27]  J. Licklider,et al.  A duplex theory of pitch perception , 1951, Experientia.

[28]  Richard F. Lyon,et al.  Computational models of neural auditory processing , 1984, ICASSP.

[29]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[30]  L. F. Willems,et al.  Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[31]  José A. R. Fonollosa,et al.  Pitch determination of noisy speech using higher order statistics , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  John G. Harris,et al.  A Pitch Estimation Algorithm Based on the Smooth Harmonic Average Peak-to-Valley Envelope , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[33]  José Carlos Príncipe,et al.  Estimating the Information Potential with the Fast Gauss Transform , 2006, ICA.

[34]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[35]  R Meddis,et al.  Modeling the identification of concurrent vowels with different fundamental frequencies. , 1992, The Journal of the Acoustical Society of America.

[36]  A. Cheveigné Pitch and the Narrowed Autocoincidence Histogram , 1999 .

[37]  M. Schroeder Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[38]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .