论文信息 - Choose the best to modify the least: a new generation concatenative synthesis system

Choose the best to modify the least: a new generation concatenative synthesis system

The paper describes a corpus-based approach applied in the evolution of ELOQUENS, the CSELT text-to-speech synthesis system for Italian, towards multi-voice, multilanguage, high-naturalness concatenative synthesis. The acoustic modules have been redesigned, according to the idea of reducing the number of junctions and the need of prosodic modification. Appropriate phonetic coverage methods were applied in the acoustic database design. Automatic processing tools performed phone and diphone segmentation, pitch marking, prosodic feature detection. The synthesis algorithm exploits the speech material at its best, searching for the longest suitable sequences in the database, according to weighted distance measures on phonetic/prosodic parameters. Signal modification techniques are applied only if necessary, to smooth residual prosodic jumps at unit boundaries. The resulting voice is quite human-sounding. Keyword: corpus-based concatenative synthesis

Silvia Quazza | Marcello Balestri | Alberto Pacchiotti | Pier Luigi Salza | Stefano Sandri

[1] Marcello Balestri,et al. The CSELT system for Italian text-to-speech synthesis , 1993, EUROSPEECH.

[2] Alan W. Black,et al. Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[3] Graziano Tisato,et al. A text-to-speech system for italian , 1984, ICASSP.

[4] S. Quazza,et al. A specialised speech synthesis technique for application to automatic reverse directory service , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[5] Claudia Barolo,et al. Automatic diphone extraction for an Italian text-to-speech synthesis system , 1997, EUROSPEECH.

[6] Roberto Billi,et al. Interactive voice technology at work: The CSELT experience , 1995, Speech Commun..

[7] Jan P. H. van Santen,et al. Methods for optimal text selection , 1997, EUROSPEECH.