An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics DOI: 10.14209/jcis.2006.11

Research on speech synthesis area has made great progress recently, perhaps motivated by its numerous applications, of which text-to-speech converters and dialog systems are examples. Several improvements have been reported in the technical literature related to existing state-of-the-art techniques as well as in the development of new ideas related to the alteration of voice characteristics, with their eventual application to different languages. Nevertheless, in spite of the attention that the speech synthesis field has been receiving, the technique which employs unit selection and concatenation of waveform segments still remains as the most popular approach among those available nowadays. In this paper, we report how a synthesizer for the Brazilian Portuguese language was constructed according to a technique in which the speech waveform is generated through parameters directly determined from Hidden Markov Models. When compared with systems based on unit selection and concatenation, the proposed synthesizer presents the advantage of being trainable, with the utilization of contextual factors including information related to different levels of the following acoustic units: phones, syllables, words, phrases and utterances. Such information is brought into effect through a set of questions for context-clustering. Thus, both the spectral and the prosodic characteristics of the system are managed by decision-trees generated for each one of the following parameters: mel-cepstral coefficients, fundamental frequency and state durations. As a typical characteristic of the technique based on Hidden Markov Models, synthesized speech with quality comparable to commercial applications built under the unit selection and concatenation approach can be obtained even from a database as small as eighteen minutes of speech. This was tested by a subjective comparison of samples from the synthesizer in question and other systems currently available for Brazilian Portuguese.

[1]  E. Bechara Moderna gramática portuguesa , 1964 .

[2]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[4]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[5]  Nick Campbell,et al.  Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[6]  Eleonora Cavalcante Albano,et al.  Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Keiichi Tokuda,et al.  Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.

[9]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[10]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[11]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[12]  Plínio Almeida Barbosa,et al.  Aiuruete: a high-quality concatenative text-to-speech system for brazilian portuguese with demisyllabic analysis-based units and a hierarchical model of rhythm production , 1999, EUROSPEECH.

[13]  Darragh O'Brien,et al.  Concatenative synthesis based on a harmonic model , 2001, IEEE Trans. Speech Audio Process..

[14]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[15]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[16]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[17]  Izabel Christine Seara,et al.  Alternância Vocálica das Formas Verbals e Nominais do Português Brasileiro para Aplicação em Conversão Texto-Fala , 2002 .

[18]  Keiichi Tokuda,et al.  Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.

[19]  Diamantino Freitas,et al.  Towards an intonation module for a portuguese TTS system , 2002, INTERSPEECH.

[20]  Alan W. Black Unit selection and emotional speech , 2003, INTERSPEECH.

[21]  Sumio Ohno,et al.  Analysis and modeling of f_0 contours of portuguese utterances based on the command-response model , 2003, INTERSPEECH.

[22]  Heiga Zen,et al.  AN HMM-BASED SPEECH SYNTHESIS SYSTEM APPLIED TO ENGLISH , 2003 .

[23]  Nick Campbell Towards synthesising expressive speech; designing and collecting expressive speech data , 2003, INTERSPEECH.

[24]  Heiga Zen,et al.  Towards the development of a brazilian portuguese text-to-speech system based on HMM , 2003, INTERSPEECH.

[25]  Fernando Gil Vianna Resende,et al.  Grapheme-Phone Transcription Algorithm for a Brazilian Portuguese TTS , 2003, PROPOR.

[26]  Heiga Zen,et al.  Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.

[27]  Takao Kobayashi,et al.  Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Filipe Barbosa,et al.  A distinção entre homógrafos heterófonos em sistemas de conversão texto-fala , 2004 .

[29]  António J. S. Teixeira,et al.  On european Portuguese automatic syllabification , 2005, INTERSPEECH.

[30]  Christina L. Bennett Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005 , 2005, INTERSPEECH.

[31]  Heiga Zen,et al.  An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005 , 2005, INTERSPEECH.

[32]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..