The importance of optimal parameter setting for pitch extraction.

In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE' [Camacho (2007)]. Recent research showed that SHS and SWIPE' outperformed the other algorithms on two speech databases with EGG reference values [Camacho (2007)]. That study, however, used a fixed search range of 40-800 Hz for all speakers, regardless of sex or speaker-specific pitch characteristics. In the current study, we adopt the parameter optimization strategy from De Looze and Rauzy (2009) to calculate specific pitch floor and ceiling values for each speaker. Our results show a substantial improvement in accuracy of the AC, CC, and RAPT algorithms when the optimized parameters are used (especially for the female speakers), and all five algorithms show similar performance. The gross error rate for all five algorithms ranges from 0.1% to 0.3% (N=18 098) on the FDA database [Bagshaw (1994)] and from 0.2% to 0.4% (N=11 527) on the Keele database [Plante et al. (1995)]. Our study thus highlights the importance of pre-processing the speech signal to determine optimal speaker-specific parameters for pitch extraction.

[1]  Stéphane Rauzy,et al.  Automatic detection and prediction of topic changes through automatic detection of register variations and pause duration , 2009, INTERSPEECH.

[2]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[3]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[4]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[5]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[6]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[7]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[8]  Arturo Camacho Lozano,et al.  SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music , 2011 .

[9]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[10]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[11]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[12]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[13]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.