Multi-source based acoustic model for speech synthesis

Traditional source-filter model has obvious limitation for speech synthesis in pitch modification due to the lack of spectrum distortion processing. To solve the problem, the paper analyzes the spectrum features of voice source in various F0 ranges and timbres in detail, and generates Muliti-Source (MS) based on analysis results by classifying the voice source into different types. The model enhances the quality of speech synthesis in various speaking mood.

[1]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[2]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[3]  H. Fujisaki,et al.  Recent Research Towards Advanced Man-Machine Interface through Spoken Language , 1996 .

[4]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[5]  Akira Watanabe,et al.  Formant estimation method using inverse-filter control , 2001, IEEE Trans. Speech Audio Process..

[6]  Lou Boves,et al.  Extraction of control parameters for the voice source in a text-to-speech system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  L. Rabiner,et al.  System for automatic formant analysis of voiced speech. , 1970, The Journal of the Acoustical Society of America.

[8]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[9]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[10]  Paul H. Milenkovic,et al.  Glottal inverse filtering by joint estimation of an AR system with a linear input model , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Lou Boves,et al.  The fundamental frequency - subglottal pressure ratio , 1989, EUROSPEECH.

[12]  L.W.J. Boves,et al.  Control of fundamental frequency, intensity and voice quality in speech , 1992 .

[13]  A. K. Krishnamurthy Glottal source estimation using a sum of exponentials model , 1989 .

[14]  Hermann Ney,et al.  Formant estimation for speech recognition , 1998, IEEE Trans. Speech Audio Process..