European speech databases for telephone applications

The SpeechDat project aims to produce speech databases for all official languages of the European Union and some major dialectal variants and minority languages resulting in 28 speech databases. They will be recorded over fixed and mobile telephone networks. This will provide a realistic basis for training and assessment of both isolated and continuous-speech utterances, employing whole-word or subword approaches, and thus can be used for developing voice driven teleservices including speaker verification. The specification of the databases has been developed jointly, and is essentially the same for each language to facilitate dissemination and use. There will be a controlled variation among the speakers concerning sex, age, dialect, environment of call, etc. The validation of all databases will be carried out centrally. The SpeechDat databases will be transferred to ELRA for distribution. The next databases to be recorded will cover East European languages.

[1]  Børge Lindberg,et al.  Environmental and Speaker Specific Coverage for the Fixed Network , 1997 .

[2]  Lou Boves,et al.  FRESCO: the French telephone speech data collection-part of the European Speechdat(M) project , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Gérard Chollet,et al.  Validating different flexible vocabulary approaches on the Swiss French PolyPhone and PolyVar databases , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Joachim Köhler,et al.  In-service adaptation of multilingual hidden-Markov-models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.