Robust Pitch Extraction Method for the HMM-Based Speech Synthesis System

This letter proposes an efficient method for extracting pitch from speech signals for the hidden Markov model (HMM)-based speech synthesis system (HTS). In the proposed method, voicing detection and pitch estimation is performed using the mean signal obtained from continuous wavelet transform coefficients. The proposed pitch extraction method is integrated in the HMM-based speech synthesis system. The Performance of the proposed method is evaluated on CMU Arctic and Keele databases. Both objective and subjective evaluation results show that the quality of speech synthesized with the proposed pitch estimation method is much better compared with HMM-based speech synthesis systems developed using the state-of-the-art pitch extraction methods, namely, robust algorithm for pitch tracking and speech transformation and representation using adaptive interpolation of weighted spectrum employed in the HTS.

[1]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[2]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[3]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[4]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[5]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[6]  Pallab Dasgupta,et al.  A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant Detection , 2016, INTERSPEECH.

[7]  K. Sreenivasa Rao,et al.  Robust Voicing Detection and F0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{0}$$\end{document} Estimation for HM , 2015, Circuits, Systems, and Signal Processing.

[8]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Thierry Dutoit,et al.  A comparative study of pitch extraction algorithms on a large variety of singing sounds , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Moncef Gabbouj,et al.  Parameterization of vocal fry in HMM-based speech synthesis , 2009, INTERSPEECH.

[11]  Junichi Yamagishi,et al.  An Introduction to HMM-Based Speech Synthesis , 2006 .

[12]  Petr Motlícek,et al.  A Simple Continuous Pitch Estimation Algorithm , 2013, IEEE Signal Processing Letters.

[13]  S. Mallat A wavelet tour of signal processing , 1998 .