A Robust and Low Computational Cost Pitch Estimation Method

Pitch estimation is widely used in speech and audio signal processing. However, the current methods of modeling harmonic structure used for pitch estimation cannot always match the harmonic distribution of actual signals. Due to the structure of vocal tract, the acoustic nature of musical equipment, and the spectrum leakage issue, speech and audio signals’ harmonic frequencies often slightly deviate from the integer multiple of the pitch. This paper starts with the summation of residual harmonics (SRH) method and makes two main modifications. First, the spectral peak position constraint of strict integer multiple is modified to allow slight deviation, which benefits capturing harmonics. Second, a main pitch segment extension scheme with low computational cost feature is proposed to utilize the smooth prior of pitch more efficiently. Besides, the pitch segment extension scheme is also integrated into the SRH method’s voiced/unvoiced decision to reduce short-term errors. Accuracy comparison experiments with ten pitch estimation methods show that the proposed method has better overall accuracy and robustness. Time cost experiments show that the time cost of the proposed method reduces to around 1/8 of the state-of-the-art fast NLS method on the experimental computer.

[1]  Y. Iiguni,et al.  Pitch Estimation Algorithm for Narrowband Speech Signal using Phase Differences between Harmonics , 2021, 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[2]  Alberto N. Escalante,et al.  LACOPE: Latency-Constrained Pitch Estimation for Speech Enhancement , 2021, Interspeech.

[3]  Ruili Wang,et al.  DeepF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Gayadhar Pradhan,et al.  Pitch and noise normalized acoustic feature for children's ASR , 2021, Digit. Signal Process..

[5]  Francisco Fernández de Vega,et al.  Multi Pitch Estimation of Piano Music using Cartesian Genetic Programming with Spectral Harmonic Mask , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[6]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[7]  Max A. Little,et al.  Bayesian Pitch Tracking Based on the Harmonic Model , 2019, ArXiv.

[8]  Jacob Moller Hjerrild,et al.  Estimation of Guitar String, Fret and Plucking Position Using Parametric Pitch Estimation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jesper Rindom Jensen,et al.  Estimation of Fundamental Frequencies in Stereophonic Music Mixtures , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Eita Nakamura,et al.  Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Jong Wook Kim,et al.  Crepe: A Convolutional Representation for Pitch Estimation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Max A. Little,et al.  A Kalman-based fundamental frequency estimation algorithm , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[13]  Søren Holdt Jensen,et al.  Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient , 2017, Signal Process..

[14]  Simon J. Godsill,et al.  Fundamental Frequency Estimation in Speech Signals With Variable Rate Particle Filters , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Eric Johnson Heller,et al.  Why You Hear What You Hear: An Experiential Approach to Sound, Music, and Psychoacoustics , 2012 .

[17]  Elias Azarov,et al.  Instantaneous pitch estimation based on RAPT framework , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[18]  DeLiang Wang,et al.  HMM-Based Multipitch Tracking for Noisy and Reverberant Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Hongbing Hu,et al.  A spectral/temporal method for robust fundamental frequency tracking. , 2008, The Journal of the Acoustical Society of America.

[21]  David Pearce,et al.  The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Hirokazu Kameoka,et al.  Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Xuejing Sun,et al.  Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[25]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[26]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[27]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[28]  Bastian Bechtold,et al.  Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods , 2021 .

[29]  Weiwei Zhang,et al.  Multi-Pitch Estimation of Polyphonic Music Based on Pseudo Two-Dimensional Spectrum , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Franz Pernkopf,et al.  A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario , 2011, INTERSPEECH.

[31]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[32]  Hideki Kawahara,et al.  Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[33]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.