Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

In this work, we present an algorithm for voiced/unvoiced decision and pitch estimation from speech signals. Our approach is based on classifying the peaks provided by the autocorrelation of the speech multi-scale product. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. The autocorrelation function of the multi-scale product is calculated over frames of a specific length. The experimental results show the robustness and the effectiveness of our approach. Besides, the proposed method outperforms some existing algorithms in a clean and noisy environment.

[1]  Noureddine Ellouze,et al.  Open Quotient Measurements Based on Multiscale Product of Speech Signal Wavelet Transform , 2007, J. Electr. Comput. Eng..

[2]  Noureddine Ellouze,et al.  A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-scale Analysis , 2009 .

[3]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[4]  T. Irino,et al.  Robust and accurate fundamental frequency estimation based on dominant harmonic components. , 2004, The Journal of the Acoustical Society of America.

[5]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[6]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[7]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[8]  David A. Krubsack,et al.  An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech , 1991, IEEE Trans. Signal Process..

[9]  L. Liao,et al.  Algorithms for speech classification , 1999, ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359).

[10]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[11]  Lawrence K. Saul,et al.  Multiband statistical learning for f/sub 0/ estimation in speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Noureddine Ellouze,et al.  Voice source parameter measurement based on multi-scale analysis of electroglottographic signal , 2009, Speech Commun..

[13]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[14]  Géraldine Damnati,et al.  Robust speech/non-speech detection using LDA applied to MFCC , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[16]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[17]  Noureddine Ellouze,et al.  Spectral Multi-scale Product Analysis for Pitch Estimation from Noisy Speech Signal , 2009, NOLISP.

[18]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[19]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[20]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[21]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[22]  Wei-Ping Zhu,et al.  A spectro-temporal algorithm for pitch frequency estimation from noisy observations , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[23]  Dennis M. Healy,et al.  Wavelet transform domain filters: a spatially selective noise filtration technique , 1994, IEEE Trans. Image Process..

[24]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[25]  T. Shimamura,et al.  Noise-robust fundamental frequency extraction method based on band-limited amplitude spectrum , 2004, The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04..

[26]  Brian M. Sadler,et al.  Analysis of Multiscale Products for Step Detection and Estimation , 1999, IEEE Trans. Inf. Theory.

[27]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[28]  John S. Baras,et al.  Properties of the multiscale maxima and zero-crossings representations , 1993, IEEE Trans. Signal Process..

[29]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[30]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[31]  Lawrence K. Saul,et al.  Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization , 2004, NIPS.

[32]  Sven Behnke,et al.  Pitch Estimation using Models of Voiced Speech on Three Levels , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.