Unsupervised Classification of Voiced Speech and Pitch Tracking Using Forward-Backward Kalman Filtering

The detection of voiced speech, the estimation of the fundamental frequency and the tracking of pitch values over time are crucial subtasks for a variety of speech processing techniques. Many different algorithms have been developed for each of the three subtasks. We present a new algorithm that integrates the three subtasks into a single procedure. The algorithm can be applied to pre-recorded speech utterances in the presence of considerable amounts of background noise. We combine a collection of standard metrics, such as the zero-crossing rate for example, to formulate an unsupervised voicing classifier. The estimation of pitch values is accomplished with a hybrid autocorrelation-based technique. We propose a forward-backward Kalman filter to smooth the estimated pitch contour. In experiments we are able to show that the proposed method compares favorably with current, state-of-the-art pitch detection algorithms.

[1]  Elias Azarov,et al.  Instantaneous pitch estimation algorithm based on multirate sampling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Lyudmila Sukhostat,et al.  A Comparative Analysis of Pitch Detection Methods Under the Influence of Different Noise Conditions. , 2015, Journal of voice : official journal of the Voice Foundation.

[3]  Jesper Rindom Jensen,et al.  Pitch estimation and tracking with harmonic emphasis on the acoustic spectrum , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[5]  Wendi B. Heinzelman,et al.  BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  Neural Network Based Pitch Tracking in Very Noisy Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Elias Azarov,et al.  Instantaneous pitch estimation based on RAPT framework , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[9]  Ronald W. Schafer,et al.  Theory and Applications of Digital Speech Processing , 2010 .

[10]  Mads Græsbøll Christensen,et al.  Synthesis Lectures on Speech and Audio Processing , 2010 .

[11]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[12]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[13]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[14]  Michael S. Scordilis,et al.  Analysis, enhancement and evaluation of five pitch determination techniques , 2002, Speech Commun..

[15]  Olaf Schreiner,et al.  Robust pitch tracking in the car environment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[17]  Steve Rogers,et al.  Adaptive Filter Theory , 1996 .

[18]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[19]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[20]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[21]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[22]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[23]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[24]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .