Weighted Multi-band Summary Correlogram (MBSC)-based Pitch Estimation and Voice Activity Detection for Noisy Speech

The pitch estimation and Voice activity detection (VAD) is the task of classifying an acoustic signal stream into voiced and unvoiced segments that plays as a crucial preprocessing tool to a wide range of speech applications. In this paper, a weighted multi-band summary correlogram (MBSC)-based pitch estimation algorithm (PEA) as well as voice activity detection (VAD) is proposed. The PEA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSC’s peaks at the most likely pitch period. This technique computes an independent normalized auto-correlation function (NACF) for each channel or frame which is relatively insensitive to phase changes across channels firstly and then filtered these NACFs to remove a significant portion beyond the pitch range 50-500 Hz and then finding an adaptive threshold from filtered NACFs. This threshold acts as a pitch position indicator and a voiced/unvoiced region detector. The accurate pitch period is obtained from the weighted MBSC. The proposed algorithm has the lowest gross pitch error (%GPE) for noisy speech in the evaluation set among the algorithms evaluated. The proposed PDA also achieves the lowest average voicing detection errors

[1]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[2]  Shingo Kuroiwa,et al.  DATA COLLECTION AND EVALUATION OF AURORA-2 JAPANESE CORPUS , 2003 .

[3]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[4]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[5]  L. Chambers Linear and Nonlinear Waves , 2000, The Mathematical Gazette.

[6]  Ananth N. Iyer,et al.  ROBUST VOICED / UNVOICED CLASSIFICATION USING NOVEL FEATURES AND GAUSSIAN MIXTURE MODEL , 2003 .

[7]  Md. Kamrul Hasan,et al.  Signal reshaping using dominant harmonic for pitch estimation of noisy speech , 2006, Signal Process..

[8]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[9]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[10]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11]  K. U. Simmer,et al.  Multi-microphone noise reduction techniques as front-end devices for speech recognition , 2000, Speech Commun..

[12]  Keikichi Hirose,et al.  Pitch estimation of noisy speech signals using empirical mode decomposition , 2007, INTERSPEECH.

[13]  An Approach to Time-Varying Spectral Analysis , 1972 .

[14]  Stephen A. Zahorian,et al.  Yet Another Algorithm for Pitch Tracking , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.