Presence of appropriate acoustic cues of affective features in the synthesized speech can be a prerequisite for the proper evaluation of the semantic content by the message recipient. In the recent work the authors have focused on the research of expressive speech synthesis capable of generating naturally sounding synthetic speech at various levels of arousal. Automatic information and warning systems can be used to inform, warn, instruct and navigate people in dangerous, critical situations, and increase the effectiveness of crisis management and rescue operations. One of the activities in the frame of the EU SF project CRISIS was called "Extremely expressive (hyper-expressive) speech synthesis for urgent warning messages generation''. It was aimed at research and development of speech synthesizers with high naturalness and intelligibility capable of generating messages with various expressive loads. The synthesizers will be applicable to generate public alert and warning messages in case of fires, floods, state security threats, etc. Early warning in relation to the situations mentioned above can be made thanks to fire and flood spread forecasting; modeling thereof is covered by other activities of the CRISIS project. The most important part needed for the synthesizer building is the expressive speech database. An original method is proposed to create such a database. The current version of the expressive speech database is introduced and first experiments with expressive synthesizers developed with this database are presented and discussed.
[1]
Anja Geumann.
Segmental durations in loud speech
,
2002
.
[2]
Heiga Zen,et al.
The HMM-based speech synthesis system (HTS) version 2.0
,
2007,
SSW.
[3]
F. Park.
ROBUST UNIT SELECTION SYSTEM FOR SPEECH SYNTHESIS
,
1999
.
[4]
H. Schlosberg.
Three dimensions of emotion.
,
1954,
Psychological review.
[5]
Milos Cernak,et al.
Slovak Speech Database for Experiments and Application Building in Unit-Selection Speech Synthesis
,
2004,
TSD.
[6]
J. Russell.
A circumplex model of affect.
,
1980
.
[7]
Alistair Conkie.
A robust unit selection system for speech synthesis
,
1999
.
[8]
Heiga Zen,et al.
Statistical Parametric Speech Synthesis
,
2007,
IEEE International Conference on Acoustics, Speech, and Signal Processing.
[9]
Darjaa Sakhia,et al.
Three Generations of Speech Synthesis Systems in Slovakia
,
2006
.
[10]
Takao Kobayashi,et al.
Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis
,
2006,
INTERSPEECH.
[11]
J. Rodgers,et al.
Thirteen ways to look at the correlation coefficient
,
1988
.
[12]
R. Thayer.
The biopsychology of mood and arousal
,
1989
.
[13]
Marián Trnka,et al.
Expressive Speech Synthesis for Urgent Warning Messages Generation in Romani and Slovak
,
2013,
TSD.