An efficient and robust pitch marking algorithm on the speech waveform for TD-PSOLA

In a Text-to-Speech system based on time-domain techniques that employ pitch-synchronous manipulation of the speech waveforms, one of the most important issues that affect the output quality is the way the analysis points of the speech signal are estimated and the actual points, i.e. the analysis pitchmarks. In this paper we present our methodology for calculating the pitchmarks of a speech waveform, a pitchmark detection algorithm, which after thorough experimentation and in comparison with other algorithms, proves to behave better with our TD-PSOLA-based Text-to-Speech synthesizer (Time-Domain Pitch-Synchronous Overlap Add Text to Speech System).

[1]  George Carayannis,et al.  Pitch detection based on zero-phase filtering , 1989, Speech Commun..

[2]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[3]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[5]  Jau-Hung Chen,et al.  Pitch Marking Based on an Adaptable Filter and a Peak-Valley Estimation Method , 2001, ROCLING/IJCLCLP.

[6]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[7]  Darragh O'Brien,et al.  Concatenative synthesis based on a harmonic model , 2001, IEEE Trans. Speech Audio Process..

[8]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[12]  Darragh O'Brien Speech synthesis based on a harmonic model , 2000 .

[13]  H. Strube Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.